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Preface 



The joint workshop of the Fraunhofer Institute of Optronics, System Technologies 
and Image Exploitation IOSB and the Vision and Fusion Laboratory (Institute for 
Anthropomatics, Karlsruhe Institute of Technology (KIT)), is organized annually 
since 2005 with the aim to report on the latest research and development findings 
of the doctoral students of both institutions. The workshop provides a forum for 
scientific discussion and debate on the presented results. Furthermore, the personal 
exchange of the doctoral students affords the opportunity of identifying new per- 
spectives, future research directions, and cooperations between the students. The 
2009 joint workshop was held in La Bresse, France, on August 02-06. This book 
provides a collection of technical reports on the research results presented on the 
2009 workshop. 

The editors would like to thank all the organizers of the workshop for their efforts 
that led in a pleasant and rewarding stay in France. The editors would also like to 
thank the doctoral students for writing and reviewing the technical reports and for 
responding to comments and suggestions of their colleagues. It is hoped that this 
collection of technical reports forms a valuable addition to the scientific and de- 
velopmental knowledge in the main research fields of the Vision and Fusion Lab- 
oratory and Fraunhofer IOSB, which are image processing, pattern recognition, 
system technologies, and information fusion. 



Prof. Dr.-Ing. Jurgen Beyerer 
Dr.-Ing. Marco Huber 
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Convex Optimization Approaches to 
Long-Term Sensor Scheduling 



Marco F. Fluber 

Variable Image Acquisition and Processing Research Group (VBV) 
Fraunhofer Institute of Optronics, System Technologies and 
Image Exploitation IOSB 
marco.huber@ieee.org 

Technical Report IES-2009-01 

Abstract: The optimization over long time horizons in order to consider long- 
term effects is of paramount importance for effective sensor scheduling in 
multi-sensor systems like sensor arrays or sensor networks. Determining the 
optimal sensor schedule, however, is equivalent to solving a binary integer 
program, which is computationally demanding for long time horizons and 
many sensors. For linear Gaussian models, two efficient long-term sensor 
scheduling approaches are proposed in this report. The first approach deter- 
mines approximate but close to optimal sensor schedules via convex optimiza- 
tion. The second approach combines convex optimization with a branch-and- 
bound search for efficiently determining the optimal sensor schedule. Both 
approaches are compared by means of numerical simulations. 



Notation 

X, X 


deterministic variable/vector 


X , X 


random variable/vector 


X, X 


mean value of random variable/vector 


—k") — 1 :k 


vector at time step k / sequence of vectors from time step 1 to k 


A 


general set 


A 


general matrix 


|A| 


matrix determinant 


A f(x] x, C) 


multivariate Gaussian density with mean x and covariance C 


Ml 


sensor schedule for time step 1 to k / optimal sensor schedule 




sensor schedule obtained via: 


J(Ml ,k) 


convex optimization (real-valued, provides lower bound) / 
conversion (binary-valued, provides upper bound) 
objective function value of sensor schedule u 1:k 
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1 Introduction 

Recent developments in wireless communication and sensor technology facilitate 
building up and deploying sensor systems for a smart and persistent surveillance. 
For instance, sensor networks consisting of numerous inexpensive sensor nodes 
are a popular subject in research and practice for monitoring physical phenomena 
including, e.g., temperature and humidity distributions, biochemical concentra- 
tions, or vibrations in buildings I1ASSC02I . For many of such sensor systems it 
is necessary to balance between maximizing the information gain and minimizing 
the consumption of limited resources like energy, computing power, or communi- 
cation bandwidth. Sensor scheduling, which is also referred to as sensor selection, 
allows trading off these conflicting goals and forms the basis for an efficient and 
intelligent processing of the sensor data. 

A sensor schedule specifies a time sequence of sensors to be allocated for per- 
forming future measurements. The main objective is to allocate the sensors in a 
most informative way, which requires making decisions involving multiple time 
steps ahead. In this report, sensor scheduling for linear Gaussian dynamics and 
sensor models is studied, where one out of a set of sensors is selected at each time 
instant for performing a measurement. For such models, one of the first works 
on long-term sensor scheduling can be found in I1MPD671 . It is shown that a sep- 
aration principle holds, i.e., the sensor schedule can be determined independent 
of the control of the observed system and independent of the measurement val- 
ues. The optimal sensor schedule then results from off-line traversing a decision 
tree consisting of all possible sensor sequences. In order to avoid enumerating all 
schedules in a brute force fashion, which is of exponential complexity, optimal or 
suboptimal pruning techniques are employed. Optimal techniques yield the opti- 
mal sensor schedule by all means without the need of examining all schedules (see 
for example ILI991 IHH08II ). Suboptimal methods as those in IIGCHM04II allow 
more significant savings in computational demand by abdicating the guarantee of 
conserving the optimal schedule. Greedy, or myopic, scheduling algorithms rep- 
resent an extreme case of suboptimal search, where a series of one-step ahead 
solutions is calculated llOsh94l|QKS07| . 

Alternatively to traversing the decision tree, which corresponds to solving a bi- 
nary integer program, convex optimization approaches have recently been pro- 
posed for solving sensor selection problems, i.e., problems of selecting the best 
n-element subset from a set of sensors (see ICMPS07IIJB09I ). These approaches 
can significantly improve the efficiency of determining informative sensor sched- 
ules, but they are so far not appropriate for optimal long-term sensor scheduling 
for arbitrary linear Gaussian dynamics and sensor models. 
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Both long-term sensor scheduling approaches proposed in this report overcome 
these restrictions. At first a general sensor scheduling problem for linear Gaus- 
sian models is formulated in Section [2] In Section [3] it is shown that this sensor 
scheduling problem is a convex optimization problem when employing continu- 
ous relaxation of the decision variables. The first approach directly solves the 
resulting convex program, which leads to suboptimal but valuable sensor sched- 
ules without demanding many computations and memory. In order to provide 
the optimal sensor sequence, the second approach described in Section R] utilizes 
branch-and-bound search for traversing a decision tree. To exclude complete sub- 
trees containing suboptimal sensor schedules as early as possible, the solution of 
the convex optimization is used for calculating tight lower and upper bounds to the 
subtrees’ values. The performance of the proposed approaches is demonstrated by 
means of simulations in Section[5] while in Section [4] conclusions and an outlook 
to future work are given. 



2 Problem Formulation 

In this report, the sensor scheduling problem for discrete-time linear Gaussian 
models is examined. The dynamics model of the observed system is given by 

x k+1 = A k -x k + w k . (2.1) 

A finite set S of sensors is considered for performing measurements, where mea- 
surement z\ from sensor i G S — {1, . . . , S’} is related to the system state x k via 
the measurement model 

4 = H‘ k -x k +vi. 

Both Afc and H], are time-variant matrices. The noise terms w k and v k are zero- 
mean white Gaussian with covariance matrices C k and C^’\ respectively. A mea- 
surement value z? k of sensor i £ <S is a realization of z l k . The initial system 
state ~ M(x_q] Xq, C§) at time step k = 0 is Gaussian with mean x 0 and 
covariance C§. 

The aim of long-term sensor scheduling is to minimize the covariance C k of 
the state x k and thus, to minimize the uncertainty of the state estimate under 
the consideration of the future behavior of the observed dynamical system and 
long-term sensing costs. For this purpose, the optimal sensor schedule wJ.jv = 
[(Ml) T ) • • • j (Mat) T ] G {0, 1} S ' N is determined over a finite A-step time hori- 
zon. Here, u* k = . . . , Wfc,s] T encodes the index of the sensor scheduled 

for measurement at time step k, i.e., if sensor i is scheduled at time step k then 
u k ,i = 1 and u k j = 0 for all j ^ i. 
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For determining the optimal sensor schedule the constraint optimization 

problem 

u\:N = argmin J(u 1:N ) (2.2) 

Ml -,N 

N 

subject to cj; ■ u k < C , (2.3) 

fc=i 

l T -«fc = l, k=l,...,N, (2.4) 

u k G {0, 1} S , k=l,...,N (2.5) 

is formulated, where J(ui. N ) = ^2k=i QkilLik) * s the cumulative objective func- 
tion to be minimized. In ( |2.3) , c k = [c k \, . . . , c k j s] T contains the sensor costs 
Ck,i, e.g., energy or communication, of selecting sensor i at time step k. With this 
constraint it is guaranteed that a feasible sensor schedule does not exceed a maxi- 
mum cost C . The scalar functions gk(-), he., the summands of J(ui . N ), quantify 
the uncertainty subsumed in C^(w 1:fc ). They can be 

• the trace operator trace (C^u-^)), whose minimization corresponds 
(graphically spoken) to minimizing the perimeter of the rectangular region 
enclosing the covariance ellipsoid, 

• the root-determinant •y/|C^(u 1 .j.)|, which leads to the minimization of the 
volume of the covariance ellipsoid, or 

• the maximum eigenvalue A max (C|(M 1:jt )), whose minimization corre- 
sponds to minimizing the largest principal axis of the covariance ellipsoid. 



The covariance itself is given by the information form of the Kalman filter 
covariance recursion (see for example IKSHOOI ) 



Cfc(Mi:fc) — ■ Cjt_ 1 (w 1:fc _ 1 ) • aJ_ x + C“_ x ) 

+ E ^•(Hl) T -(cr)“ 1 -Hl)“ 1 , (2.6) 



commencing from Cq. 

The constraints in \2A\ and ({ 33 } together ensure that one sensor per time step 
is selected for measurement. This restriction is made for brevity and clarity rea- 
sons. The extension to selecting multiple sensors per time step can be achieved by 
replacing the right hand side of \2A) with the desired number of sensors. Alterna- 
tively, by modifying ( ]2.2| > and ({ 33 }, is is also possible to minimize the sensor costs 
regarding a maximum allowed value of J( • ), i.e., a maximum allowed uncertainty. 
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3 Convex Relaxation 



The optimization problem in flX2)-d23) is a so-called binary integer program. 
Problems of this type are known to be NP-hard (see llKar72ID and thus, obtaining 
the optimal solution for large N and/or large S is computationally prohibitive in 
general. However, by replacing the binary non-convex constraints in <H3) with 
the linear constraints u k G [0, l] s for k = 1, . . . , N, a convex relaxation of the 
original problem \2.2) is obtained. To see this, it is important to note that the con- 
straints ([23) and are already convex. Furthermore, as shown in the following 
theorem, the sum to be minimized in \2.2\ is now convex as well. 



Theorem 1 (Convex Objective Function) The objective function J(u 1:N ) in \2.2) 
is convex in terms o/u 1; jv € [0, l] s ’ N . 



PROOF. To prove the convexity of gk(lLik) an d thus of J(mi jv)’ must be shown 
that (see for example I1BV081 ) 

5fc(A-Wi :Jfc + (l-A)-u 1:fc ) < \-g k (u 1:k ) + (l-\)-g k (u 1:k ) (3.1) 



for k = 1, . . . , N, Vtt 1;Jt ,« 1: k G [0, l] fc ' , and V A G [0, lj. 

At first, it is proven by induction that the covariance recursion ( |2.6| > is a con- 
vex function of u 1:fc . The induction starts with Cf(w 1 ). Defining Mj, := 



D,Z\ 

J k ) 



-1 



• W k and Pi^j) := (A 0 • C§ • A 0 r + Cg-) 1 + u h 



■Mi 



(HI) 

and utilizing the results in llKra36l on matrix convex functions, it follows from the 
matrix convexity property of the matrix inversion that 



Cf (A ■ Mi + (1 — A) ■ Ur) = (A-P 1 (mi) + (1-A)-Pi(u 1 ))- 1 

< a- pr 1 (Mi)+(i-A)- pr 1 ^ 

=Cf(«i) =Cf(«i) 

Ui G [0, l] s and VA G [0,1]. Defining the predicted covariance 
C fcK:fe-i) := A *-i ' c fc— i(“i:fe-i) ■ A fc— l + Cfe-i, it generally holds that 






Cfc(A + (1 — A) ■ u 1:k ) 

= (Cfc (a -Ui : k_i + (1 — A) - M 1: fc_i) +53 + (1 — A) ■ Wfc,i) • M’ij 

i 

< i) 

' i = 1 



i = 1 



S x -1 

+ (i-A)-(q(« 1:fc _ 1 ) _1 +53 u k ,i 



i = 1 



■o) 



l 
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(b) 

< X-Cl(u 1:k ) + (l-X)-CUu 1;k ) (3.2) 

for k = 2, . . . , N, Vu 1 ; j, u 1:k € [0, l] fc ' s , and V A € [0,1], Here, (a) results 
from the induction hypothesis that C|_ 1 (w 1;t ,_ 1 ) is convex in from the 

convexity of the matrix inversion, and from rearranging terms; (b) is the result of 
a repeated application of the convexity of the matrix inversion. 

As the trace is a linear matrix function and the root-determinant as well as the max- 
imum eigenvalue are convex matrix functions (see for example I1BV08I ). the in- 
equality in <D 3 holds if these three functions are applied on ( |3.2| i. Thus, gkilLi k) 
is convex and the nonnegative sum = J2k = l 9k(lLi k) * s convex as well, 

which concludes the proof. □ 

It is important to note that the sensor scheduling problem formulated by dl2}-(|23Tl 
and its convex relaxation proven in Theorem [TJextends existing convex approaches 
I1CMPS07I IJB09I in many ways. Instead of one-step time horizons, i.e., myop- 
tic/greedy scheduling, arbitrarily long time horizons are possible. Furthermore, 
the dynamics model in need not to be restricted to regular system matrices 
Afc and to system noise covariances CJf = 0. Especially the latter is of paramount 
importance for realistic sensor scheduling problems. Finally, there is no restriction 
to a specific scalar function gk( • ) as in I1CMPS071 . Instead, various functions for 
evaluating the quality of a sensor schedule are considered here. 



3.1 Solving the Relaxed Problem 

The computational complexity of optimally solving the original binary integer pro- 
gram is in 0(S N ). Various methods are available for efficiently solving the convex 
relaxation of the sensor scheduling problem, e.g., interior-point methods llBV08i . 
These methods typically require only a few tens of iterations for calculating the op- 
timal solution even for large problem sizes, e.g., length of time horizon and number 
of sensors beyond 10. The computational complexity of one iteration is polyno- 
mial in the number of variables in u 1; jy, which is S ■ N. The derivation of the 
gradient of J(u 1 . N ) necessary for interior-point methods is shown in Appendix|Aj 

The solution y}- [, N of the convex problem, however, only approximates the optimal 
solution uj :rl of the original scheduling problem. More specifically, is no 

longer binary and the objective function value J 1 := is a lower bound of 

the optimal value The latter finding follows directly from the convexity 

of the relaxed problem and from the fact that the relaxed solution set [0, l] s ‘ N 
contains the binary set of the original problem. 
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3.2 Conversion into Binary Solution 

In order to allow selecting sensors for measurement, , v has to be converted into 
a binary vector by employing an appropriate conversion or rounding method. The 
value J u := J(m “. N ) of the resulting (binary) sensor schedule has to be as 
close as possible to the optimal one in order to provide informative sensor mea- 
surements. In the following, two appropriate conversion methods are introduced. 
Independent of the chosen conversion method, the value J u of the converted sensor 
schedule provides an upper bound to the optimal value J(u*l. N ). 



3.2.1 Sampling 

Each component y} k of w lliv can be interpreted as a discrete probability distribution 
over the set of sensor indices S. This is due to the constraint in whereby the 
elements u' fc i , i = 1, . . . , S of are within the interval [0, 1] and sum up to 
one. Hence, a sensor i corresponding to an element u\ t with a large value can be 
considered as being more likely in the optimal sensor schedule than sensors with 
small values. 

To convert w’j.jy into a feasible binary vector, for each k = 1, . . . , N a (single) sen- 
sor is randomly selected according to the distribution For being feasible, the re- 

sulting converted schedule has to satisfy the cost constraint s Otherwise, 
the schedule is discarded. This procedure is repeated multiple times, where only 
the currently best feasible schedule, i.e., the schedule that satisfies ( |2.3| i and pro- 
vides the currently smallest objective function value J u is stored. The sampling- 
based conversion method can be terminated for example after a predefined number 
of trials or when the currently best value J a remains unchanged for a predefined 
number of trials. 

3.2.2 Swapping 

To improve a converted schedule the swapping method proposed in IJB09I 
can be adapted. A modified sensor schedule is derived from u". N by swapping a 
scheduled sensor with one of the unselected sensors for each time step. The choice 
of an unselected sensor at time step k is deterministically guided according to the 
probabilities represented by y} k , i.e., the sensors are selected in descending order 
of the values in y} k . If the modified schedule is feasible and improves the objective 
function value J u , it is used for initializing the next swapping trial. 

In order to start the swapping method with a feasible schedule, the sensor schedule 
that selects at each time step k the sensor i = argminy Ck,j with the smallest cost 
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is chosen initially. The method must terminate because there is only a finite but 
very large number of swapping possibilities. To bound the computational demand, 
the number of swapping trials is limited by means of a predefined value. 



4 Optimal Scheduling 

Determining the optimal sensor schedule and thus, directly solving the binary in- 
teger program given by ( |2.2| )-( |23| ) can be considered as searching a decision tree 
with depth N and branching factor S. The problem here is that the optimal solution 
often can be found at an early stage when employing appropriate search methods, 
while the proof of its optimality requires evaluating most of the suboptimal sensor 
schedules, which is infeasible for large problem sizes. In this section, the previ- 
ously introduced convex optimization approach is combined with efficient search 
methods for decision trees for early eliminating (pruning) suboptimal schedules. 



4.1 Branch-and-Bound 



A search technique common for classical decision problems like traveling- 
salesman or knapsack is branch-and-bound (BB) search. The basic idea of BB 
is to assign lower and upper bounds of the achievable objective function value to 
any visited node. Based on these bounds, nodes and thus complete subtrees can be 
pruned under the guarantee that the pruned node is not part of the optimal sensor 
schedule. 

For a particular node that was reached during the search by employing the sensor 
schedule Wj.j E {0, l} fc ' s , the objective function can be written according to 

= ^(Ml^s) + ^(“fe+l : jv) J (4-1) 

known unkown 



where only the value of the first summand is already evaluated and thus known. 
While the value of the second summand is not calculated yet, a lower and upper 



bound can be easily assigned to it by exploiting the results of Section 3.1 and 
Section 3.2 The value of the optimal solution Mfc+i-jv °f the convex relaxation for 
minimizing J(u.k+i-N) serves as lower bound and the conversion of y) k+1 . N into 
a binary-valued vector Wj.ij.jy provides an upper bound. Hence, the inequality 

J(Ml: k) + "^(Mfc+lijv) < v) ^ J(Ml:k) + ^(^I+I-.n) 
holds for the objective function value in ( |4. 1 [ ). 



Convex Optimization Approaches to Long-Term Sensor Scheduling 



9 



Algorithm 4.1 Initially J m - m = oo. For a given sensor schedule 



Ml:fc 



do: 



if leaf node, i.e., k = N then 

*/ m in 4— J{ui-n) // Global bound of currently best schedule Wq-at-i 

else 

U 4— 0 // List of sensors to expand 

for all sensors * £ {1, . . . , S} do // u 1:fc and Ufc+i,i = 1 fixed 



if cost,- < C and J; < 



then 



J- 4— Solve convex optimization problem 
4— Calculate upper bound via conversion 
W^WU{i} 

end if 
end for 

U 4— sort(XY) // Sort sensors based on J\ 

for all sensors i £ U do 

if J\ < J min and V j eU : J\ < then 



Expand i 

end if 
end for 
end if 



// Set Uk+i,i = 1, call Algorithm 4. 1 



It is worth mentioning that a valuable upper bound, i.e., a tight one, normally can- 
not be provided for branch-and-bound for sensor scheduling tasks (see for example 
llCMPS06llHub09ID . Here, the solution of the convex relaxation allows calculating 
tight upper bounds in a straightforward fashion. These further reduce the size of 
the search space. 



4.2 Search Algorithm 



The combination of BB search with convex optimization is illustrated in Algo- 
rithm 4.1 which basically employs a depth-first search. For a given sensor sched- 
ule Ui. k it is checked, which child nodes should be expanded, i.e., it is checked 
whether an element Uk+i,i, i £ S of u k +i cou ld be set to one or not. Therefore, 
for each child node i £ S the minimum cost possible is computed as 



k N 

cost i := 2^ c„ ■ + c k+ i ti + 22 min c n j . 

7i=l n=k + 2 J 



Furthermore, the value J; := J(u\-k+l) an ^ the bounds J\ := Ji + J(y}j ,+ 2 -n)* 
Jf := Ji + J(u^, 2 .jv) are calculated, where u k +i,i = 1 and u k +i,j = 0 for 



10 



Marco F. Huber 



all j ^ i. Based on these values, a node i is expanded only if following four 
requirements are fulfilled: 

1. The cost constraint can be met, i.e., a feasible solution exists (line 6). 

2. The value of the node is below the value J m ; n of the currently best sensor 
schedule (line 6). 

3. The lower bound J\ is below J m \ n (line 14). 

4. The lower bound is below the upper bounds of all neighboring nodes j ^ i 
(line 14). 

Obviously, the third requirement implies the second one. But in order to avoid 
an unnecessary calculation of the lower and upper bound, the second requirement 
is checked separately together with the first requirement (line 6-10). To further 
accelerate the search, the remaining sensors in U are sorted in descending order 
according of their lower bounds (line 1 2fl In doing so, the search is continued 
with the most promising sensor first in order to force a stronger reduction of the 
currently best value J m ; n . This value is automatically reduced once a leaf node is 
reached (line 1-2). 



5 Simulation Results 



The effectiveness of the proposed sensor scheduling methods is demonstrated in 
the following by means of a numerical simulation from the field of target track- 
ing. The state x k = [x k , x k , y k , j/ fc ] T of the observed target comprises the two- 
dimensional position [x k , y*.] T and the velocities [xk, Vk\ T in x and y direction. 
The system matrix and noise covariance matrix of w_ k of the dynamics model 
are 



Afc = I 2 ® 



1 T 

0 1 



and Cj“ 




(5.1) 



respectively, where I„ indicates an n x n identity matrix and <g> is the Kronecker 
matrix product. In <HT}, T = 1 s is the sampling interval and q = 0.2 is the scalar 
diffusion strength. Mean and covariance of the initial state x 0 are x 0 = [0,1, 0, 1] T 
and Cq = 10 • I4, respectively. 

1 Alternatively, U can be sorted in ascending order with respect to the upper bounds. 
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- - BBZ, Ci 

- - BBL, Ci 

- - BBC, Ci 
— BBZ, C 2 
— BBL, C 2 
— BBC, C 2 

1 2 4 6 8 10 

N — > 

Figure 5.1: Number of nodes in the decision tree when applying the branch-and- 
bound-based scheduling methods BBC (black lines), BBL (green), and BBZ (red) 
for different time horizons lengths N and for two different maximum cost 
functions Ci (dashed) and C 2 (solid) in log-scale. 




A sensor network observes the target. It consists of six sensors with measurement 
matrices 

Hfc = H| = [l 0 0 0] , H l = H| = [0 0 1 0] , 

H| = [0 0 0 l] , H®. = [0 1 0 0] , 

noise variances Cl' 1 = 0.2, Cl' 2 = Cl’ 3 = Cl' 4 = 0.1, Cl’ 5 = Cl' 6 = 0.05, 

K r-T-\ fv K rv K 

and costs c k = [1, 1, 2, 1, 2, 2] T for each k. Furthermore, it is also possible to omit 
a measurement. This option can be considered as having a seventh sensor with 
infinite noise variance. Performing no measurement is free of cost, i.e., = 0. 

Altogether, the set <S comprises S = 7 sensors. The scalar functions 5fe( • ) are set 
to the root-determinant for each k. 

For comparison, five different scheduling methods are considered: (1, denoted in 
the following by CONVEX) The approach described in Section [5] which directly 
solves the convex optimization problem and employs the swapping method for 
conversion. (2, BBC) The BB approach described in Section |4j For determining 
the upper bounds via conversion, the swapping method is employed. (3, BBL) 
Like BBC but without utilizing upper bounds for pruning. (4, BBZ) BB search that 
employs no upper bounds and bounds the second summand in <EU from below 
with zero. (5, GREEDY) The sensors are scheduled in a greedy (one-step looka- 
head) fashion (see for example llKG05| j. For CONVEX and BBC , the number of 
swapping trials is set to S ■ N . 

In Figure [571] the search performance of the three BB methods is compared. For 
this purpose, two different maximum costs C\ (N) = round (| ■ N) and C 2 (N ) = 
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Figure 5.2: Objective function values J of the scheduling methods BBC (black, 
solid), CONVEX (green, dotted), and GREEDY (red, dashed) for maximum cost 
function C\. 



2 • N are considered, which depend on the change of the time horizon length N = 
1, . . . , 10. The maximum cost function Ci (N) allows sensor scheduling without 
omitting a measurement. With the proposed optimal scheduling method BBC, the 
number of nodes in the decision tree can be kept on a low level. Here, the search 
performance clearly benefits from the tight lower and upper bounds provided by 
the convex optimization and the conversion, respectively. This can be seen in 
particular for C 2 , where BBC only visits at most 92 nodes, while the complete 
decision tree contains ^ fe=1 ^ k < 3.3 ■ 10 8 nodes. The higher number of visited 
nodes for cost function C\ compared to C 2 results from the effect that the more 
restrictive cost constraint provided by C\ leads to looser bounds. 

Without considering upper bounds for pruning as it is the case for BBL , the number 
of visited nodes increases significantly. But still, the search performance of BBL is 
much better than BBZ as the lower bound provided by the solution of the convex 
optimization is closer to the true values of the subtrees. 

Since calculating lower and upper bounds by means of convex relaxation is com- 
putationally more demanding than calculating the simple bound used for BBZ , the 
runtime of BBZ is lower for short time horizons even if BBZ leads to larger deci- 
sions trees. But with increasing length of the time horizon, the difference in run- 
time between BBZ and the other BB methods becomes smaller and at some point, 
both methods outperform BBZ. For example, with the current, barely optimized 
implementation based on MATLAB version 7.9, BBC outperforms BBZ from hori- 
zon length ./V = 9 on for cost function C\. It is expected that employing an op- 
timized implementation, outperforming BBZ occurs for significantly shorter time 
horizons. 
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In Figure 5.2 the objective function values of BBC are compared with 
GREEDY and CONVEX for the costs C\ (N). The GREEDY method is the compu- 
tationally cheapest one, but provides highly suboptimal results. Due to the myopic 
planning, GREEDY is not able to anticipate the long-term effect of early select- 
ing costly sensors. In this simulation example, GREEDY omits measurements at 
the last time steps of the horizon and not in between in order to meet the maxi- 
mum cost constraint. The proposed suboptimal CONVEX method provides sen- 
sor schedules close to the optimal ones, whereas the computational demand is 
significantly smaller compared to BBC, especially for very long time horizons. 
CONVEX trades scheduling quality off against scheduling complexity, which is 
desirable for computationally constrained sensor systems. 



6 Conclusions and Future Work 



Employing convex optimization for determining long-term sensor schedules is a 
promising approach. In this report, a general sensor scheduling problem for linear 
Gaussian models was formulated and the convexity of its relaxation was proven. 
Based on this result, two scheduling methods utilizing convex optimization have 
been proposed. The first approach directly solves the relaxed sensor scheduling 
problem in order to provide suboptimal but computationally cheap solutions. The 
second approach provides the optimal sensor schedule, where convex optimiza- 
tion is utilized for eliminating suboptimal sensor schedules at an early stage of 
branch-and-bound search. Compared to existing approaches on sensor scheduling 
via convex optimization, general linear Gaussian sensor scheduling problems are 
covered. Furthermore, both proposed scheduling methods are appropriate for long 
time horizons and many sensors, where choosing the better suited approach for a 
given scheduling problem depends on the requirements on estimation quality and 
computational capabilities. 

Future work is mainly devoted to three aspects: improving search speed for 
branch-and-bound search, incorporation of nonlinear dynamics and sensor mod- 
els, and applying model-predictive/moving horizon control. Further improving 
search speed can for example be achieved by incorporating so-called Gomory’s 
cuts, i.e., additional inequality constraints that reduce the size of the search space 
I1SM99I . By this means the proposed branch-and-bound search would be extended 
into a branch-and-cut search. A completely different way of solving the sensor 
scheduling problem would be to employ so-called outer approximation HFL94-1 . 
Here, the convex relaxation of the sensor scheduling problem is transformed into a 
series of linear programs that finally leads to the optimal (binary) sensor schedule. 
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A comparison of branch-and-cut search and outer approximation with respect to 
computational demand would by of particular interest. 

The incorporation of nonlinear models can be achieved by employing a conversion 
of the nonlinear models into linear ones via linearization, e.g., first-order Taylor 
series expansion or statistical linearization as in MHub09l . This linearization can 
be combined with model-predictive control in order to facilitate sensor scheduling 
for very long or even infinite time horizons. 

Appendix 

A Analytical Expression of the Gradient 

Since interior-point methods employ Newton’s method, the computation time of 
solving the relaxed sensor scheduling problem can be significantly reduced by pro- 
viding an analytical expression of the gradient of the objective function J(u 1 . N ). 
The gradient is given by 




which boils down to calculating the derivatives 




(A.l) 



1 :k k-\-l:N 



for k = 1, . . . , N. 



The partial derivative in (|A.l|i requires determining the derivative 





ax^ 1 = -X _1 -dX-X _1 , X G R" xrl 

and defining 



(A. 2) 




2 The argument Uj.j, is omitted in the following for clarity. 
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The first row in (|A.3) can be written in recursive form as 



(A. 3) 
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commencing from g \ = — Cf • M| • Cf for i = 1, . . . , S. For each element 
i 1, . . . , S of the second row in ( |A.3| i holds JX u k,i ■ M|, = M], . 



As the function g( ■ ) can be the trace, root-determinant, 
eigenvalue (see Section[2j, the identities 


or the maximum 


<9trace(X) = trace(<9X) , 


(A. 4) 


dy/fxj = iv/jxT trace (X^ 1 ■ dX) , 


(A. 5) 


dX i(X) =vj -dX- v_i 


(A. 6) 



from matrix calculus in differential form have to be applied to each partial deriva- 
tive for concluding the derivation of 9 g^f 1:l T In ( ]A.4 [ |-( |a 3) , X € E, rax " 

has to be replaced by C%. ( |A.4| ) and ( |A.5|) can be found in IPP08I . In ( jA.6) , A; is 
the i-the eigenvalue of X and is the corresponding i-th normalized eigenvector, 
with Ai < A 2 < . . . < A n (see HOW95IO . 
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Abstract: This contribution presents a short overview of the development 
of the Mumford-Shah functional (MSF) and its applications in image pro- 
cessing with emphasis on image registration. MSF has been primarily de- 
veloped for image segmentation having the main advantage of not requiring 
prior information. Its main disadvantages are the ill-posedness and the need 
of discretization when employing the continuous formulation. 

Despite these disadvantages MSF is meanwhile also employed for object 
detection, image registration and inpainting. The known methods for image 
registration of stereo series perform a pairwise registration mostly based on 
grey values. If the image series are combined image series (e.g., stereo and 
spectral image series) a registration method invariant with regard to the grey 
values is necessary. This contribution presents new ideas of how MSF can be 
used to perform an edge based registration of such combined image series. 



1 Introduction 



The Mumford-Shah functional (MSF) has been developed by D. Mumford and J. 
Shah and first published in a short article in 1985 I1MS85I . This has been followed 
by a circumstantial contribution in 1989, which also discusses optimal approxi- 
mations I1MS89I . The functional can be considered as one of the most powerful 
methods for image segmentation, its main advantage being that no prior informa- 
tion is necessary. The main disadvantage is that the functional in its standard form 
is ill-posed. 

As a result the subsequent scientific researches have been following two directions: 
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• The first direction is being given by those who employ the functional in 
image processing applications trying to regularize it, or by using a discrete 
formulation. 

• The second direction is being mainly followed by mathematicians, who re- 
search and prove properties of the functional and possible regularizations 
mostly for the n dimensional case; see Section [3] 

Following, the problem formulation is presented in its continuous form for the two 
dimensional case. The main goal is to segment an image (defined as a function): 

q : FI — ^ ]R , FI G IR “ 

such that the resulting image can be written as: 

s : FI — > ]R , with FI = fli U FIq U . . . U Flfj U F . 

The domain FI of the original image g is segmented in more regions f 2*, and T, 
which is a closed set of C 1 curves llMS85llMS89l . 

The image segmentation is obtained by minimizing MSF: 

E(s, T) := fiE d (s,F) + E s (s,F) + vE\(T) . (1.1) 

Thus the minimization of the functional imposes with its terms the following 
conditions on the result: 

• -Ed(s, r) ensures that the segmented image is as similar as possible to the 
original image. 

• E s (s, T) demands the smoothness of the result. 

• Ei(T) demands that the sum of the lengths of the curves is minimal. 

/x and v are weights for scale and contrast, g influences the dimensions of the 
segmented regions and v influences the contrast between them. 

For the discrete case a more common and simpler formulation for describing im- 
ages using matrices (unlike in MMS85llMS89H where the notion of latices HHer67ll 
is employed) is chosen. In this case, the image to be segmented is represented 
using a matrix B . The segmented image is a partitioning of the initial image. This 
means, the resulting image can be represented using distinct submatrices Sj of B , 
such that all Si put together form B. F remains a closed set of C 1 curves describ- 
ing the partitioning; see Figure |TTT| The structure of the functional is the same as 
in Eq. (JTTTJ. 
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Figure 1.1: Examples for the set of curves T for the continuous and discrete 

case I1MS85IIMS89I . 

The present contribution gives first in Section[2]an overview of the developed mod- 
els of the Mumford-Shah functional, followed by some of its main properties in 
Section [3] Further on, a parallel between the discrete and continuous models is 
discussed in Section [4] An overview of the applications of MSF in image pro- 
cessing is given in Section [3] The emphasis is on possibilities to use MSF for the 
registration of combined stereo and spectral series. 



2 Models of the Mumford-Shah Functional 

The most simple model is the Ising model llMum94l : 

E(s , r) Ising : = fiE d (s, T) + vE\(s, T) 

F Ji ( s{u ) - g(u)) 2 du + vU l (T) . 
n 

It only consists of two terms, renouncing on the demand of the smoothness of 
the result, at the cost of only being able to segment grey value images, by which 
foreground and background “obviously” differ llMum94l . The first term measures 
the similarity between the original image g and the segmented one s. The second 
term gives the length of the curve by means of the ID Hausdorff measure llCS05t . 
The Ising model has the advantage of being superior to the simple threshold based 
segmentation, by a better handling of outliers. 

The second model is called the Cartoon model and it is the standard model of MSF: 

E(s, F) := fi JJ (s(u) - g(u)) 2 du + fj || Vs(w)|| 2 du + i/H}(T) . (2.1) 

n tt\r 

The second additional term i? s (s, T) with regard to the Ising model assures the 
smoothness of the result on the entire domain 12 without T. 
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The discrete formulation is similar to the continuous one I1MS851IMS89II : 

£'(r)discrete := - fffc ) 2 + ^ (s k - Si ) 2 + v\ |T| | . 

k (fe,i) 

(k,l)eSS 

s k and gk are elements of the resulting S and original B matrices, respectively 
(i.e., s k and g k are image pixels). J\f is a set containing the indexes of neighbouring 
matrix elements (i.e., neighbouring pixels in the image) and ||r|| describes the 
length of the curves (generally, ||r|| is a measure for volume). 

The only assumption made for the segmentation by means of MSF is that the 
regions to be segmented are separated by sharp curves. The disadvantage of the 
model is its tendency to approximate the curves with minimal surfaces llOss97l . 
see also Section [3] 

This disadvantage is compensated by the Theater-Wing model, which takes into 
consideration, that the objects in the image occlude each other. Therefore the 
curves describing the boundaries of the objects can be treated separately and the 
problem of minimal surfaces is avoided HMum94l . 

Another model is the Spectrogramm model, which is able to segment textured 
surfaces by using their local spectral signature in the image llMum94ll . The next 
sections of this contribution concentrate on the properties and applications of the 
Cartoon model (Eq. (jZl}), as this is the most reviewed and analyzed model in the 
literature. 



3 Properties of the Mumford-Shah Functional 

One of the most important properties of the MSF is its tendency to perform the 
segmentation such that the obtained curves are minimal surfaces. The property 
of minimal surfaces is that they intersect only three at a time with 120° angles 
between them llOss97l |Pav05ll . This property also known as the Mumford-Shah 
conjecture results from the definition of T as a finite union of C 1 curves llDav05l 
and holds for the two dimensional case. 

Further, the set T of curves does not contain isolated (short) curves or dispersed 
curves ILS94I . These properties prove the superiority of the MSF with regard to 
other image segmentation algorithms, which strongly depend on image features 
(e.g., corners and edges). 

Another property of MSF is that it is well-posed only in its discrete formulation. 
The continuous formulation is ill-posed llMum94l . llKKB + 07l proposes an equiva- 
lent convex formulation (i.e., if a minimum of the functional is found, then it is the 
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Figure 3.1: Example of two different optimal segmentation results of the left 
image using MSF llDav05l . 



global minimum). Some other proposed regularization are presented in Sections]?] 
and [5] 

The segmentation problem by means of MSF always has a result, i.e., the existence 
of the result has been proven in I1LS94I . but the uniqueness is not provided. A 
clear example is given in Figure [3TT] The left image in Figure [3TT| can be optimal 
segmented in the two ways shown in the two right images. 



4 Discrete vs. Continuous Models 



One of the common approaches to use MSF in its continuous formulation (see 
Eq. <]27T)) is by approximating F with level sets IICVOll . More exactly, T is the 
zero level set of a Lipschitz continuous function < j> segmenting two regions of an 
image I0sh03t : 



(j> : fl — > R, r := { u € = 0} . 



The two segmented regions of the image are given by the sets: 

uj := {u 6 0 | 4>{u) > 0 } 
f }\ cj ° := {it G f2|<^(M) < 0} , 

with uj° as the interior of the set ui. It is important to mention that none of the sets 
u> or Q \ lo° must be connected. Therefore it is straightforward that T is, in this 
case, defined as the set of pixels u describing the boundary dco of the set u>. 
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The MSF functional is then defined as ICVO 1 1 : 

E(b 1 ,b 2 , 4 > ) := JJ ( g(u ) - &i) 2 H ((j>(u))du 

n 

+ JJ{ 9 (u)-b 2 ) 2 (l- H {(t>{u)))du ( 41) 

+ V jj <*(<£(u))||V<£(u)||d«. 

n 

bi and b 2 are the mean grey values in the two segmented regions u> and \ w°. 
H(cf)) is the Heaviside function and <5(</>(it)) := D 1 H(tf>(u)) is the generalized 
derivative of the Heaviside function IQsh03l . V represents the divergence. 

The first two terms of the formulation of MSF using level sets in Eq. <EU a P‘ 
proximate the first term Ei(s, T) in Eq. ( |2. 1 [ >, by supposing that the regions to be 
segmented are homogeneous. Therefore, it is only necessary to measure the dif- 
ference between the grey values of the original image and the mean grey values 
(i.e., b\ for the interior and b 2 for the exterior of the set w, respectively) of the 
segmented images. The sum of the first two terms is then minimal, if the zero level 
set of 4> best approximates the boundaries of objects ICV01I . The last term of 
Eq-ED is a surface integral and computes in the two dimensional case the length 
of the curves represented by the zero level set of (/>. The result of the segmentation 
is obtained by minimizing the functional in Eq. ( |4. 1} e.g., by using finite difference 
method ICVO 1 11 . 

The advantage of combining MSF with level sets like in Eq. <ED is that the 
smoothness assumption acts regularizing. The disadvantages of the model in 
Eq. <gd} are that it can only segment foreground from background and that the 
results depend on the initialization of the curve and the approximations of H(4>) 
and 8{<j>) IlCVO 111 . 

Another possibility of segmenting images by means of MSF is using a discrete 
formulation similar to the continuous one IEXSE07I : 

E(b 1 ,b 2 , T) := ^2(g(u) - bi) 2 -uj u + ^2(g(u) - b 2 ) 2 ( 1 - zu u ) 

'U 'LL 

+v E E m yJ3J u k (1 :) + (P ) 

u k u\eM(u k ) 



(4.2) 
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where zu u is a binary variable: 

_ f 1, c/)(u ) > 0 

Wu ' [0, else. 

The first two terms of the functional in Eq. are similar to the first two terms of 
Eq. ( |4. 1 j and have the same functionality. The third term measures the length of the 
curves of T by means of the Cauchy-Crofton formula IIBK03L i.e., if the curves 
r lie between two neighbouring pixels u l and u k , then a weight Wi is added, 
depending on which neighbor of Uk the pixel u\ is (e.g., left or right neighbor). 

The minimization of the functional in Eq. ( j4.2| i is done using the graph cuts al- 
gorithm IIBK03I IEXSE07I . which searches for the maximum flow cut in a graph 
having as nodes the pixels in the image and as costs on the edges the values of the 
respective terms of Eq. ( p3) HBK03I . 

The advantage of the use of the discrete formulation of MSF minimized by means 
of graph cuts (besides the well-posedness) is that the result is independent of ini- 
tializations. Moreover, graph cuts guarantee to converge to a ’’good” minimum; 
i.e., a boundary is given in llVek99l . The disadvantage of being able to only 
segment foreground from background remains. 



5 Applications in Image Processing: the image reg- 
istration case 

The only application of MSF mentioned until now was image segmentation. In 
this section more applications and modalities to employ MSF in image processing 
are presented. 

Besides the usual segmentation of an image in foreground and background, the 
detection of objects in images is an important topic in image processing. There are 
two contributions employing MSF for object detection to be mentioned: 

• The first one proposes a method to expand the continuous formulation of 
MSF by another term, which contains prior information about F (e.g., the 
form of the curves) I1CTWS02I . This term acts regularizing, such that the 
objects can be correctly detected and segmented in the images, even if they 
are partially occluded. 

• The second contribution provides the connection between snakes, geodesics 
and MSF by proving that the minimization of MSF is equivalent to the 
minimization of the geodesic snake functional llBEV + 05l . 
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Another important domain where MSF is useful is image registration. The main 
purpose here is to find a transformation / : f } gi — > 1R 2 between two images gi : 
£l gi — > and Qj : — > ]R 2 such that: 

d(9j,9i ° /) min, 



with d being an arbitrary distance function, which measures the differences be- 
tween two images. The registration by means of MSF is an edge based registra- 
tion; i.e., the only functionality of / is to warp the curves of T from one image into 
the other; see Eq. CD- 

The functional for registering two images comprises therefore, among other terms, 
the segmentation terms of the standard MSF (Eq. (|2. 1 [>) for both images: 



E(si, Sj , r, /) . g 



- gi(u)) 2 du + JJ ( Sj(u ) - gj(u)) 2 du + 

\f2 9i / 



ci gi \r 



|Vsi(w)|| 2 dit + JJ ||Vsj(M)|| 2 dit + vW}(T) . 

n»,A/( r) 



(5.1) 



The results are two segmented images Si and Sj 
along with the transformation function / and one 
set of curves T, which best segment both images 
gi and gj. Therefore, the first two terms mea- 
suring the difference between the original and the 
segmented images are alike. The third and fourth 
terms ensure the smoothness of the resulted seg- 
mented images, such that the smoothness of the 
second segmented image gj is demanded on its 
support f l g . without the transformed set of curves 
/(r). As there is only one set of curves for both Figure 5 T The tmer 

images, only one term is necessary to measure grabbed in the image ser i e s 
their length. in Figure^ 

As a regularization approach, HDro05l proposes the Dirichlet boundary condition, 
which is formulated as an additional term to the functional in Eq. CD- E ro na 
registers pairs of positron density (PD) and magnetic resonance (MR) images. 

This contribution proposes another possibility of extending MSF for simultane- 
ously pairwise registration of stereo image series (i.e., more than two images). 
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Figure 5.2: Stereo and spectral image series of the scene in Figure [57T| acquired 
with a camera array. The cameras were equipped with different spectral filters. 
The middle transmission wavelength of the spectral acquisition filters is written in 
the upper right corner of each image. 



Particularly for combined stereo and spectral image series, where standard stereo 
registration algorithms can not be applied, could such an extension be very useful. 
Due to the spectral component, the images in the combined stereo and spectral 
series have different grey values for the same object point; see Figure 5.2 In Fig- 
ure |5T|the original acquired scene (an orange tiger with black and white stripes) 
is presented. 



There are two possibilities of registering such combined image series by means of 
MSF: 



• The first one is to built a functional having the terms of MSF from Eq. 
for each image and additionally a term that measures the dissimilarity 
between the segmentations: 

E(si , . . . , Sni Ti , . . . T n ) := 
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E 



/ 




(Si(w) 



V «. 



ffi(M)) 2 dM+ JJ ||Vs i (u)|| 2 du + i'H 1 (r i ) 



n 9i \ r < 




fi, ^min d e (u, dir + JJ h ^min dw 



(5.2) 



with d e as a distance measure taking in consideration the disparity (i.e., the 
difference between the position of corresponding pixels in the images, which 
is inversely proportional to depth) and h a function which ensures robust- 
ness. a is a weight for the dissimilarity term. The result consists in the 
segmented images si, . . . , s n and the sets of curves Ti, . . . , r„. 

• The second possibility is to extend Eq. <HU- For this a labeling function 
z : Yl z — > C. is defined on the set of all pixels Yl z := U . . . U f l gn in 
the images of the series. The labels are disparities between the pixels of 
a chosen image pair of the series I1GF1B08I . z is consequently a function 
describing corresponding pixels in the images. 



E(si , . . .s„,r, z) := 

V JJ ( s i( u ) - 9i(u)) 2 du + Y^ JJ ||V Sl Hfd M | + ^(D : 

i V >!•; 

(5.3) 



whereas F is the determined set of curves in one selected image. is 
computed by image warping using the function z IIFL04II . In this case, the 
results comprise besides the segmented images only one set of curves T, 
which should best segment all images of the series and the function z, which 
describes the registration. 



The advantage of the first formulation in Eq. ( |5.2| i is that for images that are very 
different (e.g., spectral images, by which the contrast between neighbouring image 
regions differs from image to image, see Figur e |5.2) a segmentation of each image 
is found, even if these are different; see Figure 5.3 In the second case of Eq. 153 
the solution is forced to one set of curves, which in some cases might fail to work 
(i.e., such that no set of curves Y is found). As an example, the segmented images 
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Figure 5.3: Segmented images from Figure [572] using MSF from Eq. ( |4.2[ >. 



in the lower row of Figure 5.3 are completely different from the segmented images 
in the upper row. 



Even though the segmentation results with MSF are very promising, further ex- 
periments are necessary to prove its superiority with regard to the methods em- 
ployed until now to register combined image series IGMHB08I IGFlB08ll . i.e., the 
images were segmented using the watershed transformation IIG W08H and then reg- 
istered by comparing the features of the obtained regions. The main problem of 
this approach was that the segmentations of the images were very different (see 
Figure |574| ), which has been making the feature based registration difficult. 

Summing up, the advantages of using MSF for segmenting combined image series 
are: 



The segmentation results for the images are similar, at least for images 



acquired in neighbouring spectral bands; see Figure 5.3 



• The segmentation and registration steps are performed simultaneously. 



• No prior information or parametrization (e.g., for the segmentation) is 
required. 



The disadvantages that could be established until now are: 
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Opposite to the watershed transformation, which gives a detailed segmenta- 
tion of each image (see Figure 5.4 1 , MSF segments only foreground from 
background. This means that only some edges (those of the segmented 
region) can be used for registration. The computed transformation might 
therefore be not representative for all pixels in the image. As an example, 
the segmented regions in Figure |53] in the images acquired at 800 and 850 
nm may at most be representative for the face of the tiger, but not for its 
legs; i.e., the depth values of the respective scene planes are different. A 
possible solution is a sequential segmentation and registration by masking 
the regions previously registered. 



• In the case of combined stereo and spectral images, it is difficult to find a 
common segmentation for all images. 



For completeness, some other applications of MSF in image processing worth 
mentioning are: inpainting I1ES02I and dynamic texture segmentation I1DCFS03I . 
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6 Conclusions and Future Work 

This contribution presented a short overview of the Mumford-Shah functional and 
its applications in image processing. It has been observed that mainly one of the 
four models given by |Mum94| , namely the Cartoon model, is being employed 
in image processing. Since MSF is ill-posed regularizations are needed. These 
can be achieved either by adding an additional term to the functional (e.g., de- 
scribing boundary conditions) or by assuming that the regions to be segmented are 
homogeneous. 

One important image processing application for which MSF can be employed is 
image registration. This contribution presented some new ideas for registering 
combined stereo and spectral image series by means of MSF. Even though the first 
enquires are very promising, a more detailed analysis and tests are required. The 
results will be published in future contributions. 
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Abstract: Local Bayesian fusion approaches reduce high computational costs 
caused by Bayesian fusion. This paper mainly deals with focussed Bayesian 
fusion, a special local Bayesian fusion technique, which is easily realizable 
in practice. An interval scheme for global probabilities is derived. Using 
concepts from information theory and decision theory, other new results con- 
cerning the global meaning of focussed Bayesian fusion and its consistency 
with Bayesian decision theory are presented. 



1 Introduction 

To automate the combination of information from several information sources, an 
adequate fusion methodology is needed. The Bayesian fusion methodology ful- 
fills all essential requirements on a reasonable fusion methodology IBSW071 . It is 
mathematically funded and has theoreticly a comprehensive range of application. 
Information is represented by probability distributions in the sense of the Degree of 
Belief interpretation. By this, the Bayesian fusion methodology accounts for every 
kind of uncertainty in an adequate manner llLin87l . However, its practical applica- 
tion in real world examples is often critical due to high storage and computational 
costs. 

By the use of local Bayesian fusion approaches, high costs caused by Bayesian fu- 
sion can get circumvented. At local Bayesian fusion, the complete calculation of 
the posterior distribution for all possible values of the Properties of Interest (Pol) 
is avoided. For this, ideally all task specific information, i.e., prior knowledge 
and source specific information, is pre-evaluated. The aim is the detection of val- 
ues of the Pol that have a higher potential to be the true value than others have. 



32 



Jennifer Sander 



Probabilistic statements are then made with respect to modifications of the usual 
Bayesian model. 

Restricting the space Z that specifies the possible values of the Pol delivers a 
straightforward fusion scheme. With respect to the Pol, the actual Bayesian fusion 
task gets completely restricted on U C Z, here. The set U contains the most 
task relevant part of Z. All other possible values of the Pol, i.e., the elements of 
Z\XJ, are completely ignored. Because of its theoretic similarity to the focussing 
mechanism of the Bayesian fusion methodology HBHSG08I . this technique has 
been termed focussed Bayesian fusion in IISHGB09I1 . 



1.1 Contributions 

Focussed Bayesian fusion has been already considered in particular in ISB081 
and I1SF1GBQ9II . The aim of the present paper is to report new results on this 
local Bayesian fusion technique. The main focus of the investigations is the 
global meaning of focussed Bayesian fusion. Additionally, questions with re- 
gard to its consistency with decision theory are addressed. Within this report, 
focussed Bayesian fusion is looked from different perspectives. For this, purely 
probabilistic, information theoretic, and decision theoretic views are adopted. 



1.2 Structure 



Necessary basics of Bayesian fusion are shortly reviewed in Section[2] For a more 
detailed introduction, the reader is referred for example to HBSW07I Bey99|[BS04l . 
In Section[3] a purely probabilistic view is adopted to derive an interval scheme for 
global probabilities. Provided that the global prior relevance of the local context U 
is known, this interval scheme is computable within a focussed Bayesian model. 
Therefore, it is described in Section |3TT| what knowledge about global probabilities 
is obtainable if only their focussed equivalents are known. These results are also 
generally useful for probabilistic modelling in cases in that it is unknown if the 
domains are chosen comprehensive enough. Possible negative consequences of 
the closed world assumption of Bayesian modelling, which usually makes quan- 
titative Degree of Belief statements possible, become clear. Facts concerning a 



meaningful construction of focussed Bayesian models are used in Section 3.2 for 
the derivation of lower bounds for global probabilities, which are computable if the 
prior relevance of the local context is ratable. These bounds complement known 



upper bounds, which are reviewed in Section 3. 1 The resulting probability interval 
scheme is analyzed in Section |33j As shown in I1SHGB09L an information theo- 
retic quality indicator for local Bayesian fusion makes sense on the basis of both, 



Further Investigation of Focussed Bayesian Fusion 



33 



the Minimum Information Principle (MIP) HWil80H and Walkers minimization rule 
for Bayesian inference IIWal06ll . The information theoretic quality indicator also 
serves as a construction rule for meaningful focussed Bayesian models. In Section 



respectively, of the local context is rated information theoreticly. Therefore, an 
information measure, which is the basis of the analysis in Section |4,1| is used. 
In Section [5] local Bayesian fusion is considered in the context of decision the- 
ory. Bayesian theory distinguishes itself by its consistency with decision theory 
. Bayesian inference is even a special form of a decision problem IBS04I . 
The consistency of focussed Bayesian fusion with decision theory is demonstrated 
in Section HE A way to solve a decision problem on the basis of a focussed 
Bayesian model by the use of the derived probability interval scheme is discussed 
in Section [5^2] Finally, Section |6]is a short conclusion. 

1.3 Remarks 

By the transformation of all information into a probabilistic representation, fusion 
tasks, which may involve most diverse kinds of information, get methodically uni- 
fied. For the comprehensive validity of the results about local Bayesian fusion, 
their development is based on the assumption that the pre-evaluation is done after 
performing the transformation. 

In case of statistical conditional independence of the information contributions, 
the individual transformation of each of them in a probabilistic representation by 
a source specific Likelihood is sufficient to realize exact Bayesian fusion, see Sec- 
tion^ Here, the Likelihood with respect to all information contributions results 
from the multiplication of the source specific ones. For mathematical simplicity, 
the presented results are based on the consideration of the Likelihood with respect 
to all information contributions. However, their special adaption for source specific 
Likelihoods is straightforward — at least if the statistical conditional independence 
of the information contributions holds. 



| Bey99 



4. 1 the meaning of Walkers minimization rule for local Bayesian fusion is further 



clarified without taking recourse to the theoreticly more complex MIP. In Section 



4.2 the value of the knowledge of the global prior and global posterior relevance, 



2 Bayesian Fusion 

By z = (z\, . . . , zpj) G Z = Zi x ... x Zn, N G IN, the Pol are denoted. 
We assume that the quantity z adopts a certain, but not directly observable true 
value. Let d s G D s denote the information contribution of the information source 
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s G {1, . . . , S'}, S G IN’, and d = (d \ , . . . , d s ) summarize all source specific 
information. 

At Bayesian fusion, prior knowledge with respect to the Pol is represented proba- 
bilistically b)£]the prior distribution p(z ) and source specific information specifies 
the Likelihood p(d\z). Bayesian inference on the basis of the Bayesian theorem 
delivers the posterior distribution p(z\d): 

, |n P(d\z)p(z) 

P\ z \d) = (x p{d\z) p(z) , zeZ. 

P\ d ) 

If the information contributions are statistically independent given z, which can 
be assumed for example if heterogenous information sources have to be fused, a 
sequential inference scheme is mathematically exact: 

s 

p{z\d) oc n p(d s \z)p(z), zeZ. 

s = 1 

The computational complexity for the calculation of the posterior distribution i^] 

OiUti \Zn\) = 0(^ N ) where £ = \Z n \ denotes the geometric mean 

value. To obtain meaningful estimates for the true value of z, decision theoretic 
concepts have to be used. Because they have to applied on p(z\d), Bayesian fusion 
demands considerable computational resources regardless of wether the whole 
posterior distribution or wether a simple estimate for the true value is demanded as 
final result. It gets clear that the complete calculation of p(z\d) for all z G Z has 
to be avoided to reduce the computational complexity of Bayesian fusion. Local 
Bayesian fusion approaches realize this objective. 



3 Probability Bounds 

3.1 Meaning of Restricted Probability Statements 

The restriction of Z on U mathematically corresponds to a conditioning on the 
local context U, see I1SHGB09I . The focussed posterior distribution pu{z\d) is 



'The components of d and z are discrete or continuous depending on the fusion task. The term 
distribution is used for a density distribution in the case of continuous quantities and for a discrete 
distribution in the case of discrete quantities. 

2 By \Z„\, the cardinality of a discrete subspace Z n and the Lebesgue measure (length, area, or 
volume) of a continuous subspace Z n is denoted. 



Further Investigation of Focussed Bayesian Fusion 



35 



given by 

Pu (z\d)= P (z\d,U)=l^’ ^[u (3 ' 1} 

In the focussed Bayesian model, event^ji? C Z\ U are assumed to be impossible. 
Because of the normalization requirement on a probability distribution, the global 
posterior probability mas^jP(Z\I7|d) = J z ^ p(z\d) dz of Z\U is redistributed 
on U. This is done for z G U by the division of the global posterior distribution 
p(z\d) by the global posterior probability mass P(U\d) = j u p(z\d) dz of the 
local context U. By this, the global posterior probability P(E\d) = j E p(z\d) dz 
of events ECU gets increased by the distortion factor Hence, within the 

local context U, local posterior probability statements Pu ( ■ \d) represent upper 
bounds for their global equivalents: 

P{E\d) < Pu{E\d) , ECU . (3.2) 

There are two influence factors that affect the degree by that the absolute value 
of the posterior probability of an event E C U is distorted by the focussing. The 
lesser the global posterior relevance P(U\d) of the local context U , the more global 
posterior probability statements are overestimated by the use of the focussed pos- 
terior distribution, i.e., for two different local contexts U\, U 2 C Z and an event 
E C U\ fl U 2 , it holds 

P(E/i|d) < P(U 2 \d) => Pu 2 (E\d) < PuAE\d) . 

The lesser the global probability P(E\d) of an event ECU, the lesser its absolute 
value gets increased by the focussing: for E\, E 2 C U, one has 

P(E,\d) < P(E 2 \d) => Pu(Ei\d) - P(Ei\d) < Pu(E 2 \d) - P(E 2 \d ) . (3.3) 

It should be remarked that certain conclusions on the basis of a focussed Bayesian 
model are globally fully meaningful. These are comparisons of posterior probabil- 
ity statements for events E C U by the use of probability ratios I1SB08I . In partic- 
ular, the focussed posterior distribution introduces with respect to events ECU 
the same preference ordering as the global posterior distribution does. 

3.2 Special Knowledge at Focussing 

If the pre-evaluation is performed with respect to the probabilistic information 
representations, see Section Q3J U is defined to contain these elements z of Z 

3 Events are sets to that a probability is assigned. 

4 The notation J z ... dz stands for integration with respect to continuous subspaces and for 
summation with respect to discrete subspaces of Z. 
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for that p{d\z) or p{z) is high enough)^] The introduction of a threshold 5 for 
the value of the Likelihood with respect to z is critical because there is no nor- 
malization requirement for the Likelihood with respect to z. As consequence, 5 
has to depend on parameters that give it a relative nature. E.g., the dependency 
5 = 5(max Z £z p(d\z)) makes sense IISB08I . Another kind of dependency is 
described in Section 14, 11 

The results of Section [TTJhold in an analogous manner for prior probability state- 
ments. The global prior probability P(U) = J[jp(z) d z rates the global prior 
relevance of the local context U. By the focussing, the prior relevance of U gets 
modified to one. However, in many focussed Bayesian fusion tasks, the value of 
P(U) is at least approximately ratable. 

Assuming P(U) to be known, the inequality p{d\z) < 5 for z E Z \ U delivers a 
lower bound for the global posterior relevance P(U\d) of U : 

P(u\d) = SuM*)P(z) cU 

PW 

= JuP(d\z)p(z) dz 

f uP (d\z)p(z) dz+ J Z \ uP {d\z)p(z) dz 

> JuP(d\z)p(z) dz 

~ IuP( d \ z )p( z ) dz + 5 fz\uP ( z ) dz (3 ' 4) 

_ fuP(d\z)p(z) dc 

fu P ( d \ z ) P( z ) dz + 5 (1 - P(U)) 

= fu P(d\z) Pu (z) dc = _ 

fup( d \ z )pu( z ) dz + S{p^ - 1) 

The focussed prior distribution pu(z) is related to the global prior distribution 
p(z) by the identity pu(z) = jffrj- The Likelihood of the focussed Bayesian 
model is exactly an extract of the Likelihood of the global Bayesian model. To 
make these connections clear, the general inference scheme for focussed Bayesian 
fusion I1SB081 is cited: 



pu(z\d) oc p(d\z)pu(z) , zeU. 

From these considerations, it becomes clear how (3 is computable within the 
focussed Bayesian model. 

5 If an interpretation of the prior knowledge as an additional information contribution is possible, 
the analysis can be unified ISBOBI . However, this possibility is not used, here. 



Further Investigation of Focussed Bayesian Fusion 



37 



By the application of the lower bound for P(U\d), the factor by that a focussed 
probability statement with respect to an event E C U is distorted relative to its 
global equivalent can be bounded from above: 



Pu(E\d) = 1 1 

P{E\d) P{U\d) - p ’ - ' 



(3.5) 



By the use of lower bounds for global posterior probability statements can 

get calculated within the focussed Bayesian model: 



P(E\d) > /3Pu{E\d) , ECU. (3.6) 



3.3 Probability Interval Scheme 

An interval scheme for global posterior probability statements is delivered by ( |3.2| > 
and ( |3.61 >: it holds 

P(E\d) e [pPu(E\d),Pu(E\d)] , Ecu . 



These probability intervals are calculable within a focussed Bayesian model if <5 is 
known and if the global prior relevance P(U) of the local context U is ratable. 

The interval scheme can get easily extended to events E C Z \ U because by the 
use of ( |3.4| >, one obtains 

P{E\d) < 1 - P{U\d) < 1 - p , ECZ\U. 



Hence, we have 



P(E\d)e[l(E),r(E)\ := 



[pPu(E\d),Pu(E\d)\ , 

[0, 1-p] , 



ECU , 
ECZ\U. 



(3.7) 



In an experimental evaluation, it has been demonstrated that (3 is a qualitatively 
good bound for the global posterior relevance P{U\d) of the local context U at 
focussed Bayesian fusion. It becomes shaper with increasing concentration of 
the Likelihood on U. There are cases in that the knowledge of the probability 
interval scheme ( |3.7| > is sufficient to identify the maximum a posteriori estimate of 
a discrete global posterior distribution. 

In IISHGB09I . the small world formalism has been used to explain the consistency 
of local Bayesian fusion approaches with Bayesian theory in a demonstrative man- 
ner. Here, the basic essentials of the argumentation will be reviewed and further 
expanded. 
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For the mathematical solution of a fusion task, an adequate modelling is needed. 
It is not possible to model the real world completely. A global Bayesian model 
(Z x D,p(z, d )) corresponds to a so called model world. It contains at least that 
part of the real world that is meaningful for the fusion task. At local Bayesian 
fusion approaches, the actual fusion task is solved only with respect to a so called 
local world, which is a real subsej^jof the model world. By construction, the local 
world has a high chance to be task relevant. At focussed Bayesian fusion, the local 
world corresponds to the focussed Bayesian modeQ(?7 x D,pu(z, d)). 

Focussing makes events ECU more probable than they globally are. By the 
knowledge of the bound /3 for the global posterior relevance P(U\d ) of the local 
context U, this overestimation is reversible in principle. However, this reversion 
is to strong and as consequence, it makes events ECU less probable than they 
globally are. 

By the additional introduction of a reference world that lies between real world and 
model world, all derived probability bounds can get included in the small world 
formalism. The reference world has the form ( R x D ,pu(z,d)) with Z C R. 
Thereby, the probability distribution pr(z , d) is not known completely. Taking 
into account one obtains for the corresponding posterior distribution: 



At focussed Bayesian fusion, the local world also represents a focussing of such 
a reference world. The exact values of reference posterior probability statements 
Pji(E\d) are known for events ECU. They are equal to the bounds 1(E) of 
the probability intervals in S For events ECU, the upper bounds in ( |3.7) 
are the focussed probability statements. Hence, for events ECU, the length 
d(E) := r(E ) — 1(E) of the probability intervals is identical to the absolute value 
of the distortion that results from the focussing of the reference world on the local 
world. The distortion factor at this transition is jj. 

By the application of the previous results, conclusions concerning d(E) can be 
easily derived. The larger /3, the lesser d(E) is for all ECU. Because of ( |3.4} , 
a large value of /3 indicates that the global posterior relevance P(U\d) of the local 
context U is large. Hence, large values of P(U\d) generally correspond to small 

6 All kinds of subset relations within the small world formalism are task specific, see ISHGB09I . 

7 Strictly speaking, in a focussed Bayesian model of the kind (U x D,pu(z, d)), the probability 
of events E f U is not defined. To demonstrate the connection between global and local Bayesian 
models explicitly, such probability values nevertheless have been introduced in dD and been declared 
to be zero. 




Ppu(z\d) , z eU , 
1-/3, z c Z\U , 
? , z e R\Z . 



(3.8) 
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probability intervals for events ECU. It is directly clear from ( |3.7|) that this 
connection also holds for events E C Z\U . For events E\,E 2 C Z\U,i\. 
holds that d{E\) = d(E 2 ) = 1-/3, i.e., the probability intervals of such events 
all have the same length. In contrast, the length d{E) of an probability interval for 
an event ECU depends of the global posterior probability of E. More precisely, 
for E\, E 2 C U, it holds 

P(Ei\d) < P(E 2 \d) => d(E\) < d(E 2 ) . (3.9) 

The proof of ( )3 ,9| ) bases on (£3}. Because of £3}, its clear that the reference 
posterior probability Pp(E\d) influences the length of the probability interval: the 
lesser Pp(E\d), the lesser d{E) is. Combining £3) and £3J, one obtains for 
ECU the identity Pr{E\6) = Due to the fact that /3 and P(U\d ) do 

not depend on the choice of E, £3} gets clear from this identity. 

A comprehensive analysis about the best use of the probability interval scheme 
may be an important topic in further research. In this regard, it should be stressed 
out that by the use of probability intervals, uncertainty is not described purely prob- 
abilistically. However, we generally hold the view that the probabilistic calculus 
in the Bayesian sense is completely sufficient to handle every kind of uncertainty. 



4 Information Theoretic Analysis 

4.1 Meaning of Walkers Minimization Rule 

Walkers minimization rule bases on Zellners investigation of the information theo- 
retic optimality of Bayesian inference, see for example IIZel02l . He demonstrated 
that a distribution r(z) embodies given information in the form of prior knowledge 
and source specific information in an optimal manner if it minimizes the functional 

F(r(z)) — I r(z) logp(d|;s:) cte + KD[r(z),p(z)] . (4.1) 

JzEZ 

In ((43), KD denotes the Kullback-Leibler distance. The domain of F is the set of 
all probability distributions r(z) on Z with KD[r(z),p(z)] < 00 . 

This minimization rule is structurally different to the minimization rule that un- 
derlies the MIP. At the application of the MIP to Bayesian inference, a functional 
over probability distributions on Z x D is minimized in the strict sense. 
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Global Bayesian fusion 

• Focussed Bayesian fusion 

• P{U) additionally known 

• P(U\d) additionally known 



Figure 4.1: Schematic illustration of the information theoretic value of addition- 
ally knowing P(U ) and P(U\d), respectively, at focussed Bayesian fusion. 



At Bayesian inference, the amount of information about the quantity of interest y 
that is provided by the given data x is usualljj^jdefined to be 

lp( y )[x\ -=KD[p(y\ x ),p(y)\ . (4.2) 

Because of the identity F(p(z\d, U)) — F(p(z\d)) = I p ( z \d)[U], the difference 
F(p(z\d, U )) — F(p(z\d)) measures the amount of information that is additionally 
delivered about 0 by the assumption “z £ U ”, which underlies focussed Bayesian 
fusion. Clearly, the local context U should be chosen such that the value of this 
quantity is low. 

As explained in HSHGB091 . from the identity IpGI d)[U\ = — log P{U\d) it fol- 
lows that at focussing, one should ignore preferably these z £ Z for that both, 
p(d\z) and p{z), adopt low values. Thereby, the additional requirement that 
KD[p(z|(7, d),p(z\d)] has to be hold low enough gives the thresholds for p(z) and 
p(d\z) a relative nature. 



4.2 Knowledge of the Relevance of Local Contexts 



The knowledge of the global prior relevance P(U) of the local context U is in- 
dispensable for the calculation of the derived lower bounds of global posterior 
probability statements on the basis of a focussed Bayesian model, see Section 



3.2 Here, the value of the knowledge of the global prior and global posterior rele- 



vance, respectively, of the local context U is also rated in the context of information 
theory. 



s This common definition is for example given by Bernardo IBS04I . Lindley introduced another 
definition, see [Lin56l . The difference between the two definitions is often overseen because the ex- 
pected value of the amount of information about y provided on average by data that are distributed 
according to p(x), i.e., the mutual information between x and y, is the same with respect to both of 
them. Bernardos definition seems to make more sense, here, because it corresponds to a real posterior 
analysis. 
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In Figure 4.1 it is graphically shown which connections between a global and a 
focussed Bayesian model are information theoreticly quantifiable by the use of the 
information measure that has been defined in (14. 2b. It is marked to what extent the 



knowledge of P(U) and P(U\d), respectively, is needed for this. 



If the focussed Bayesian fusion has been performed, the amount of information 
about 0 that is locally provided by d, i.e., l p ( z \u){d\, can get calculated. If the 
global prior relevance P(U) of the local context U is known, the amount of in- 
formation that is delivered about 2 solely by the assumption “z £ U” is addition- 
ally calculable because we have I V ( Z )\U] — — log P(U). Another straightforward 
calculation delivers the identity 



lp(z\U)[d\+lp(z)[U\ = I p( z )[d, U] . 



Hence, by the knowledge of the global prior relevance P(U) of the local context 
U, the amount of information that is delivered about z by both together, d and the 
assumption “z £ U”, is also quantitatively ratable. 



On the other hand, by the knowledge of the global posterior relevance P(U\d) of 
the local context U, the amount of information that is additionally to the informa- 
tion from d delivered about z by the assumption “z £ U”, i.e. , l v ( z \d) \U\, is exactly 

this quantity has 



4.1 



calculable: it holds I p ( z \d)[U] — — log P(U\d). In Section 
been identified as basis for a construction rule for meaningful focussed Bayesian 
models. It is stressed out that in Section [4,1 1 the knowledge of the exact value of 
this quantity has not been assumed. 



5 Decision Theoretic Analysis 

5,1 Fundamental Consistency of Focussed Bayesian Fusion 

Bayesian inference aims at the probabilistic representation and further develop- 
ment of the state of knowledge with respect to the Pol. Bayesian decision theory 
also considers the consequences that result from choosing an action a £ A. The 
consequences usually depend on the true value of the Pol. 

A preference ordering on the set A of available actions subject to the true value 
of the Pol can be specified by the determination of an utility function u(a, z). An 
action az is globally optimal if it maximizes the expected utility with respect to 
the global posterior distribution, i.e., if it holds 

az = argmaxU[a;p(z|d)] with \][a;p(z\d)\ := / u(a, z) p(z\d) dz . 

aeA J z 
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Usually, a decision maker is termed to act rational within this kind of decision 
theoretic model if and only if he chooses a globally optimal action IIRN03II . 

At focussed Bayesian fusion, the global posterior distribution is not known com- 
pletely. The choice of an action may be based on the expected utility with respect 
to the focussed posterior distribution pjj{z\d): an action au is defined to be locally 
optimal if it holds 

ajj = argmaxU[a;p[/(^M)] with \J[a;pu(z\d)] = / u(a, z) pu(z\d) dz . 
aeA J v 

Obliviously, it may hold 



U[a z ;p{z\d)] > U[au ; p(z\d)} , 

i.e., a locally optimal action is not globally optimal in general. This may call the 
global rationality of a decision maker who uses a focussed Bayesian model into 
question. 

This seeming inconsistency arises if the storage and computational costs that are 
necessary for the solution of a decision problem are not taken into account. In 
this case, the decision is based on posterior expected utility considerations without 
reconsidering the costs that arise for the necessary calculations. At Bayesian de- 
cision theory, calculation costs in dependence of the choice of the domains of the 
involved quantities are usually neglected. This is because Bayesian theory gener- 
ally does not address questions with respect to the task specific generation of these 
domain^ In particular, modifications of a Bayesian model with respect to Z are 
usually not analyzed. 

To make such an analysis possible, let C[V] denote the costs that are caused by 
a Bayesian fusion scheme that is focussed on U C Z and let ay denote the 
respective optimal action. The effective posterior expected utility is defined by 

V e s[ay;p(z\d),C[V ]] := U[a v ;p(z\d)} - C[U] . 

Generally, the inequality C [U] < C [Z] will hold for U C Z. Hence, it is possi- 
ble that the effective posterior expected utility of an action that is locally optimal 
with respect to U C Z may exceed the effective posterior expected utility of the 
globally optimal action. This means that it may hold 

\Jeft[az',p(z\d),C{Z}\ < U e ff [au;p(z\d),C[U]\ . 



9 See for example the discussion in IBS04I about the generation of a dynamic frame of discourse. 
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I.e., taking into account the significant savings of storage and computational costs 
that can be reached by focussed Bayesian fusion, a locally optimal action may be 
effectively a better choice than an action that is globally optimal. Assuming that 
u(a, z) is uniquely chosen, not negative, and bounded, the following estimation 
holds: 



0 < U[az]p(z\d)} - \][au;p{z\d )] 

= / (u(az, z) - u(au, z)) p(z\d) dz 

Jz€U 

+ / (u(az,z) - u(au,z)) p(z\d) dz 
Jzez\u 

< / (u(az,z) - u(au,z)) p(z\d) dz 
Jzez\u 

< (1 — P(U\d)) max max u(a,z) — min min u(a, z) 

-v v 1 " \aeA zez\u v ' aeAzeZ\U J 



< (1 — P(U\d)) max max u(a,z ) . 

aEA zEZ\U 



This means for example that a Bayesian model that is focussed on U C Z is 
preferable for the solution of a decision problem if the global posterior relevance 
P{U\d) of the local context U is large and if the value of the utility function is 
low for all actions with regard to these values of z that lie in the part of Z that 
is ignored at focussing — in comparison to the costs that would be caused by an 
evaluation of this part of Z . 

If rational decision making is set equally to expected utility maximization, the 
given justification has still a shortcoming because mathematically the effective 
posterior expected utility does not have the form of an expected utility. 

Flowever in reality, the described kind of expected utility maximization can not 
be termed always rational: strictly speaking, the calculation of an expected utility 
itself corresponds to an act and may not possess maximal expected utility HSar09l . 

To formalize this fact, Sargent introduced in llSar09l the concept of decision levels. 
By an (informal) adaption of this concept, the meaning of effective posterior ex- 
pected utilities can be further explained within the decision theoretic context. The 
application of a global Bayesian model and the application of a Bayesian model 
that is focussed on U C Z for the solution of the decision problem correspond 
to acts on a second decision level. Their utilities can be defined to be the respec- 
tive effective posterior expected utilities. After the second level decision has been 
made, the actual decision problem is solved on the first level by an expected utility 
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maximization on the basis of the Bayesian model that has been chosen on the sec- 
ond level. Hence on the first decision level, the choice of a locally optimal action 
that is not globally optimal may be fully rational. 

An exact weighting of costs in the form of computational and storage costs and 
an abstract kind of utility may not be possible. As consequence, the exact math- 
ematical solution of the second level decision problem will not work in general. 
It is stressed out, however, that the main application area of local Bayesian fusion 
approaches are tasks where global Bayesian fusion is not feasible due to its pro- 
hibitive complexity. In such a case, the costs of global Bayesian fusion have to 
exceed the expected utility of a globally optimal action. It is also clear that local 
Bayesian fusion approaches will deliver the best possible results if the size of the 
local context U is scaled accordingly to the available resources. 



5.2 Use of Probability Intervals 

The integration of the probability interval scheme that has been derived in Section 
|3,3| offers an additional approach to the handling of decision problems that are 
solved on the basis of a focussed Bayesian model. Exemplarily, a technique for 
the case of a discrete Z, which traces back to Fishburn llFis64l . will be adapted, 
here. To make it directly applicable, ( |3.7| ) is rewritten in the following form: 

f(z) < p(z\d) := f(z) + h(z) < f(z) + g(z) , zeZ. (5.1) 

The functions f(z) and g(z) are known while h(z) is unidentified. According to 
( |3.7| ), we have f {z) = 0 for z £ Z \ U . It must hold 

y, h{z) = 1 - ^ f ( z ) and 0 ^ K z ) < g( z ) . z € Z . (5.2) 

zEZ zEU 

On basis of bounds for the differences of the global posterior expected 

utilities of actions ai,a 2 £ A are obtained: 

\J\ai]-p(z\d)} - \J[a 2 ;p(z\d)] 

= X! ~ u i a 2 ,z)) f(z ) + ^2 ( u ( a l, z ) - M (« 2 , z)) h(z) (5 3) 

z(EU z(EZ 

£ [m(a i,a 2 ),M(a 1 ,a 2 )\ . 

The first term at the right side of the identity in ( |5 .3 1 is calculable within the fo- 
cussed Bayesian model even without regarding values of the utility function for 
z £ Z \ U . This does not hold for the second term, which contains the unknown 
function h(z). 
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In the case that m(ai,a 2 ) > 0 holds, a\ definitively has a higher posterior ex- 
pected utility than a 2 has. It becomes clear that 02 is surely not globally optimal 
in this case. Analogously, if M{a\, < 12 ) < 0, 02 is globally preferable to a\ and a\ 
is surely not globally optimal. Additionally, if [m(ai, 02 ), M(ai, 02 )] C [c, d ], it 
holds X5[ai\ p(z\d)\ G [U[a 2 ;p(z|<i)] + c,U[a 2 ;p(z|d)] + d\. 

Fishburn also describes a method for the identification of [m(ai, 02), M(a\, 02)]. 
Firstly, the elements of Z are ordered according to the value of their util- 
ity differences with respect to a\ and 02 . For this, we determine a sequence 
(z(*))ie{ i,...,|Z|} as follows: 



*( 1 ) ■ = 
z(i) := 



argmax {u(ai, z) — u{a 2 , z)) , 

ZEZ 

are; max (u(ai,z) 

zez\{ z ( i),...,^(z-i)} v 



- u(a 2 ,z)) , 



(5.4) 



For the identification of M(a\,a 2 ), this sequence is then processed in ascending 
order. Thereby, for each i G {1, . . . , \Z\}, the value of h(z(i)) is set to be the 
maximal possible one such that h(z) satisfies ( |5,2[ i. Inserting the resulting function 
h(z) in the second line of ( |5.3] > delivers M(a\, a 2 ). The calculation of m(a 1 , 02 ) 
works mostly analogously. Therefore, the sequence ( |5.4| ) has to be processed in 
reversed order. 

Generally, the identification of m{a\, a 2 ) and M (ai, a 2 ) is not completely locally 
realizable. Because of this, a further adaption of the described proceeding to local 
approaches could be a topic of further research. 

For the rest of this section, the current problems will be exemplarily discussed 
with respect to the calculation of M(ai,a 2 ). The determination of an ordering 
according to ( |5 ,4| > may cause problems because it has to be done for all z G Z . In 
the resulting sequence, elements of Z\U may be stand on each possible position. 
Verifying the second condition in for the determination of h(z) is not critical. 
Checking the first condition in ( |5.2| ) is only locally realizable if there exists a j G 
{1, . . . , \Z\} such that T := (Jf=i *(*) £ u and E 26 t H z ) = 1 “ T, z eU f ( z ) 
holds. 



6 Conclusion 

An interval scheme for global probabilities, which is calculable in a focussed 
Bayesian model, has been presented. Thereby, the lower bounds are calculable 
if the prior relevance of the local context can be rated. Focussed Bayesian fu- 
sion has also been analyzed by the use of concepts from information theory and 
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decision theory. By these considerations, new results concerning the global mean- 
ing and consistence of focussed Bayesian fusion have been obtained. It has been 
demonstrated that the solution of a decision problem on the basis of a focussed 
Bayesian model can be termed to be rational. 
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Abstract: Today’s manufacturing plants are faced with continuous changes 
and enhancements. They are equipped with heterogeneous software systems 
for different types of tasks, both manufacturing operations and factory plan- 
ning. To remain competitive, plant operators have to respond quickly to the 
situation and requirements of the market. Technology has to support this. Any 
change has to be considered, and the equipment and the information technol- 
ogy have to be able to adapt quickly. Technology has to support this: Efficient 
modifications result in an increased demand for adaptability or flexibility. 

For an efficient and usable information exchange, all systems involved 
have to interact as seamlessly as possible in the heterogeneous environment. 
This is called interoperability, which is based on the compliance with consis- 
tent standards. Therefore, it is not only important how to communicate, but 
also what to communicate. Data descriptions must be examined as well as 
suitable communication. MES have to form a part of the integrated industrial 
engineering chain from mechanical engineering, PLC programming to oper- 
ations. This contribution deals with some of the challenges that have to be 
handled to achieve interoperability. 



1 Introduction 

MES have to cope with different challenges to achieve interoperability. Interoper- 
ability within this context is the ability of different independent production-related 
systems to cooperate and exchange information efficiently. Semantic interoper- 
ability means that the systems also understand each other. This can also be de- 
scribed as seamless semantic integration. Therefore Flexibility or even mutability 
are necessary. Flexibility means the ability to react or the adaptability to several 
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known issues. Ultimately, this leads to the adaptability and the ability to react to 
unknown issues. This is called mutability. 

These topics address mainly the phase of engineering MES. In the following con- 
tribution, engineering is viewed as a creative planning process, which is done step 
by step in collaboration of different disciplines such as mechanical or electrical 
engineers. Within this process, the involved persons are searching for a solution 
for an individual, but not particular task. This is very complex and comprises 
many participants, ideas, and tools. Thus, it is expensive, time-consuming, and 
error-prone llDra08l . 

The base of interoperability and flexibility is an integrated engineering chain also 
consisting of MES. Therefore, the complete engineering process of MES has to 
be changed and even improved. At the moment, engineering of MES takes place 
at the end of the plant planning or reconfiguration process and is done manually. 
This results in a big amount of manual effort and is cost-intensive and error-prone. 

Many different groups are engaged in the improvement of the engineering process- 
not only in the field of MES, but also in the more general area of automa- 
tion and control technology. A German example is the VDI (Verein deutscher 
Ingenieure) standardization committee called ‘Integrated engineering of control 
systems’ (VDI/VDE-Gesellschaft Mess- und Automatisierungstechnik (GMA): 
Fachausschuss 6.12 ‘Durchgangiges Engineering von Leitsystemen’) |SLF + Q8l . 
1FSM09I . 

In this contribution, the complex field of MES is represented by production mon- 
itoring and control systems, which are a special type of production-related IT. In 
this context a production monitoring and control system is a complex central or 
decentral IT system for collecting, aggregating/condensing and processing pro- 
cess signals and values in real time. It has a controlling effect on manufacturing 
and assembling processes, either in an automated way or by means of user inter- 
ventions. In line with the definition by Polke in llPol94l . a control system is meant 
to support shop floor staff in managing their equipment, and in controlling and 
monitoring the production processes. 

Challenges and possibilities for the interoperability of MES are depicted in Fig- 
ure 0 These are: Vertical interoperability (1), horizontal interoperability (2), 
lifecycle interoperability (3), the interoperability interface (man-machine interop- 
erability/interaction, 4), and an interoperability data representation (interoperabil- 
ity data format, 5). They all lead to new potentials and possibilities (6). 
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Figure 1.1: Interoperability of MES 



Hereinafter, the author tries to explain how to face these challenges. Thus, differ- 
ent solutions, concepts, and tools are explained, which were developed in differ- 
ent research and development projects at the Fraunhofer IITB and the Universitat 
Karlsruhe (TH), also in cooperation with industrial partners. 



2 Challenges and Concepts Towards Semantic In- 
teroperability of MES 

2.1 Vertical MES Interoperability 



Future MES will be integrated vertically with the shop floor and collaborate and 
interact with components of this level. Plug-and-work mechanisms support this 
integration. If something changes within the plant, this leads to changes within 
the MES. The vision is an automated integration of new components and the 
automated adaption of existing components. 

The ProduFlexil research project lSBQ + 09l . IIEOB07I . sponsored by the Federal 
Ministry of Education and Research, deals with adaptivity and self-configuration 
of manufacturing plants by developing appropriate software mechanisms and ar- 
chitecture patterns that allow plants and plant components to be integrated in an 
existing production system as efficiently as possible. In this context, the focus 
is on enhancing the flexibility of the plant software and on the configuration and 
integration of superordinated systems, e.g., a production monitoring and control 
system. 



Copyrighted material 
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At the production of the ‘Mercedes C-class’, the production monitoring and con- 
trol system is connected to 450 PLCs and administrates round about 1000 process 
visualization images. According to Fraunhofer IITB’s experience in this field, the 
manual engineering effort is 30-40 percent of the total effort for such a software 
solution. Out of this percentage, plug-and-play could save about 80 percent. 

The pre-requisite for an automated engineering is a self-description containing 
all information required by the production monitoring and control system from 
different sources I1BMSE08I . These are details about the process signals, but also 
visualization data. Existing standards in the field of automation technology limit 
the amount of describable information. The vision is a plant, which can simply be 
linked into the process by ‘plug-and-work’. 

Production monitoring and control system engineering of the future will therefore 
require a consistent and neutral data format into which existing data formats can 
be integrated. It has to answer the question ‘What to communicate?’-Necessary 
contents must be structured. At the same time, the semantic of the contents must 
be clear. In addition, standardized communication and processing mechanisms 
are necessary to answer the question ‘How to communicate?’-This includes the 
process as well as the methods. Therefore, the IITB presents a solution, which 
combines two standards to form one framework for automated engineering ISE07I . 

The independent XML-based CAEX (Computer Aided Engineering Exchange, 
IEC62424) data exchange format was originally developed in process engineer- 
ing. Fraunhofer IITB, however, has proved that the format is also suited for the 
efficient exchange of data in production engineering ISDS08tt . The format takes 
account of the problem that the standardization of the tools available on the mar- 
ket only makes sense to a certain degree, if it makes sense at all. The exchange 
of CAE (Computer Aided Engineering) planning data between various systems is 
structured and organized, and it is used as a standardized data format for the auto- 
mated engineering of production monitoring and control systems. It is explained 
in more detail below. 

In this application, CAEX was complemented by OPC UA (OPC Unified Archi- 
tecture, IEC62541), the service-oriented successor to the industrial OPC commu- 
nication standard. OPC UA allows for data communication, synchronization and 
processing in the prototypical engineering framework (see ltSch08ll l. 

The underlying idea is a consistent standard interface, standardized communica- 
tion with all systems involved, service-oriented processing, investment protection 
in view of supplier-specific formats and ultimately the enhancement of the quality 
of data throughout automated processes. Combining both standards to form one 
framework boosts the strengths of each and opens up new potentials for the red-hot 
topic of “automation of automation” llSch09bll . 
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2.2 Horizontal MES Interoperability 



MES components-even from different vendors-must be integrated horizontally. 
This is done by instruments and concepts as Service-Oriented Architectures 
(SOA), ontologies or an integrated data management. Pro Vis. Agent® is a pro- 
duction monitoring and control system and in this respect, a core component of 
modern MES. The underlying software agent-based communication enables our 
customers to integrate existing applications, which opens up new synergy poten- 
tials. Pro Vis. Agent® can be seen as a vacuum cleaner for data, in other words a 
data integration platform. 



Today, MES are isolated applications. Tomorrow, they have to be interconnected 
to each other in SOA, for instance, based on OPC UA (OPC Unified Architecture). 
Each MES has its own view on the common information. 



As mentioned above, OPC UA is a standard, which accomplishes to support pro- 
cess communication in a structured way with an underlying user-defined informa- 
tion model. The information model enables users to create a representation of their 
plant using the object-oriented model paradigm. The OPC foundation therefore 
provides an XML schema for describing these models. Furthermore, the founda- 
tion defined a graphical representation for OPC UA information models, which is 
much more intuitive for users than the XML representation. 



For the information model, the Fraunhofer IITB developed a graphical editor (see 
Figure [2TT] ). It allows to model graphically ‘address spaces’ of OPC UA servers, 
which are the ‘living’ online representation of the underlying information model. 
Therefore, it provides the graphical base elements defined by the OPC Foundation 
and reacts as a normal painting program. The whole application is based on the 
Silverlight technology to enable web application. 



The defined graphical model can be exported to an XML file, which is conform 
to the UA XML schema and can be imported into the OPC UA servers, but the 
UA modeller goes even further. It includes the possibility to import CAEX (see 
Section 2.4 i data and transforms it to an OPC conform description. Thus, MES 
can take advantage of the capabilities of OPC-UA servers and CAEX data can be 
managed online. 
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Figure 2.1: OPC UA Modeller 



It is essential for horizontal integration that MES ‘understand’ other systems. 
Therefore, the IITB gets involved in standardization. Within the German as- 
sociation for engineers (VDI), the IITB conducts the working group ‘MES- 
logical interfaces’, which forms part of the MES standardization committee (VDI- 
Gesellschaft Produkt- und Prozessgestaltung (VDI-GPP): Arbeitsgruppe ‘Logis- 
che Schnittstellen MES-Maschinenebene’ des Fachausschusses 2.5.1 ‘Manufac- 
turing Execuction Systems’). The motivation of the standardization committee re- 
sults from the existence of numerous standards and the lack of one unique standard. 
Thus, machines and plant manufacturers have to adapt to the particular standard 
of their customers, which results in manual effort. The goal of the standardiza- 
tion committee is to standardize the communication contents between production 
plants and MES. There, machines and plants manufacturers and operators develop 
together a common interface. The semantic description of the contents is done by 
means of an OWL (Web Ontology Language) ontology. Every user can read this 
description, interpret, and integrate it in his automation and IT environment. The 
results will be published. 



2.3 Man-machine Interface for MES Interoperability 

Interoperability comes along with the interaction of different systems as well as 
different users. Thus, the interface between man and machine is essential. Human- 
centered computing providing only required information to the users according to 
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their tasks and roles has to be taken into account as well as legal effects. The mod- 
ified engineering process results from the user’s requirements and possibilities. 
Hence, the automated engineering framework consists of different tools provid- 
ing the user the possibility to influence the automatically generated results. Ex- 
amples for such tools are the layout manager and the process-product-resource 
(PPR) visualization for the engineering framework (see Section 2.1 i, which are 



described hereinafter. Furthermore, a modified engineering process resulting from 
the automation of automation has to be considered. 



Within the engineering process, the generation of the process control images as 
the human-machine interface has to be considered above all. If users create pro- 
cess control images manually, the very same process may be depicted differently 
depending on the preferences of the person who has drafted them. Thus, the pro- 
cess control images should be as standardized as possible, while, at the same time, 
being as individual as necessary. 



In manufacturing, an automated system for image generation will only be accepted 
if the user interface is user-friendly and intuitive. The special field of human en- 
gineering | Syr70) aims at adapting machinery and other technical equipment to 
humans to optimize their cooperation. The characteristics, potentials and require- 
ments of human beings are taken into account, and the visualization of machinery 
and/or equipment is based on these conditions. In this process, visualization must 
be based on ergonomic guidelines. 



The layout of the visualized information has to be as clear and well-arranged as 
possible, enabling even inexperienced users to interpret the data intuitively. In ad- 
dition, appropriate algorithms have to be developed to position the existing equip- 
ment components as well as I/O signals on the process control image in line with 
the actual layout. Finally, users should be able to adapt the process control images 
to their personal requirements. The layout manager generates process images out 
of CAEX data. The user just has to define his preferences, the rest is done auto- 
matically IISS08II . This ensures that the visualization is consistent and always up 
to date. 



In addition to typical resource visualization in control technology, the engineering 
component also visualizes products and processes involved in the production pro- 
cess !Sch09all . Products stand for all products and product components, processes 
include all kinds of activities, and resources comprise equipment, staff, software, 
etc. This classification brings about additional semantic meaning for the system 
elements, such as ‘I am a product’, ‘I am a resource’ or ‘I am a process’. The 
individual types of elements can be linked with each other, with resources be- 
ing the central component in this model as resources execute processes and re- 
sources process products. The movement of produced products and changes in 
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ongoing processes (activities) are visualized on the basis of dynamically updated 
CAEX data. The structure of elements, equipment, and products contained in the 
images is created dynamically. The allocation of products and processes to a re- 
source (equipment) is equally performed dynamically. It is not necessary to update 
processes and product positions in real-time for control technology. The process 
signals, by contrast, continue to be visualized in real-time. 



2.4 Data Representation for MES Interoperability 



Improving efficiency in MES engineering and achieving interoperability requires 
an integrated data representation-an interoperability data format. 



CAEX is one possible example for such a format. It was developed in cooperation 
between the Department of Process Control Engineering of the RWTH Aachen 
and the ABB Research Center in Ladenburg. The definition of CAEX has been 
taken down in the standard IEC62424. CAEX is a semi-formal description lan- 
guage, which is based on XML. It contains an XML-meta model for describing 
the setup and structure of plant data. First and foremost, the format supports li- 
brary concepts and object-oriented approaches. It is possible to integrate libraries 
from users and suppliers as well as project libraries. In addition, both a top-down 
and a bottom-up system design is supported. The technical innovation of this ap- 
proach is the syntactic and semantic unification of the data. This allows decoupling 
of the required configuration algorithms from the data sources. As mentioned in 
CAEX is used as a data format within the engineering framework 
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CAEX-Editor. It hides the XML specific syntax and enables the user to compre- 
hend the data structure at a glance. Hence, it is intuitionally usable and helps to 
understand CAEX. 



One ‘other’ interoperability data representation to meet these challenges is the 
Automation Markup Language (AutomationML, http://www.automationml.org), 
which is a superset of CAEX. AutomationML is an XML based data format for the 
exchange of plant engineering information. It is defined by the AutomationML Or- 
ganization. Members are the companies Daimler, ABB, KUKA, Siemens, Phoenix 
contact, NetAllied, and Ziihlke as well as the Fraunhofer IITB and the Universi- 
ties of Karlsruhe and Magdeburg. The standardization committee ‘DKE K941.0.2 
AutomationML’ (Deutschen Kommission Elektrotechnik Elektronik Information- 
stechnik (DKE): Gremium ‘DKE K941.0.2 AutomationML’) works on the stan- 
dardization of the format. The vision is a seamless integration of production mon- 
itoring and control systems into the virtual startup and a digital production release. 
This allows plants to be evaluated and optimized within the Digital Factory by 
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software systems used during real-world operation. The mission is to intercon- 
nect engineering tools of different disciplines and to reduce in consequence the 
work-intensive engineering process. Therefore, it integrates different standards 
and consolidates them ‘under one umbrella’. Hence, the AutomationML architec- 
ture consists of many differnt formats. As top-level and semantic-integrating data 
format, CAEX was chosen. This is supplemented by Collada (http://collada.org/) 
for geometry and kinematics and PLCOpen XML (http://www.plcopen.org/) for 
behavior and sequencing. The integration of further standard formats is possible 
(see IlDLPHOSl ). The author estimates that AutomationML will become the ‘glue 
for seamless automation engineering’. 

From the semantic point of view ontologies are also an adequate instrument to 
model manufacturing and process plants. CAEX and AutomationML therefore 
adapt several concepts and mechanisms from ontologies and can be seen as domain 
specific specialism. 



2.5 Concept of Automated Engineering 

Figure [23] depicts the overall concept for automated MES engineering. Its goal 
is to provide maximum assistance to the user. Therefore, system-specific CAEX 
is created by the CAEX importer out of the existing XML export of the users’ 
tools. This CAEX data is unified with CAEX data from other tools. To this end, 
a mapping consisting of transformation as well as fusion operations is necessary. 
After extracting a MES view on these data, a reverse generator extracts system- 
specific XML data out of this CAEX data. The whole processing is based on OPC 
UA and is completed by a GUI-based interface as Man-machine-interface. The 
following aspects cover specific parts of the concept, which are depicted within 
the corresponding figures. 




Figure 2.2: Overall concept for automated MES engineering 
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Almost every production planning tool posseses an XML export. The CAEX im- 
porter shown in Figure 2.3 and developed for semantic import takes advantage 
of this circumstance and utilizes XML technology. Its purpose is to transform a 
system-specific XML file into a system-specific CAEX file. In this process, it sup- 
port the user and reduces manual work. At the same time, it provides the user the 
possibility to integrate his implicit knowledge and helps to explicitely model this 
knowledge. 



Semantic is in this process incorporated by turning the specific XML tags and 
structures into standardized CAEX structures. To achieve this, several steps have 
to be processed. First, the user specifies one or more connections between ele- 
ments from the CAEX XML schema and elements within his XML export file. 
This can be assisted by an interactive CAEX explanation describing the four base 
structures: Interfaces (logical, technical, and mechanical interfaces), SystemUnits 
(types), InternalElements (instances), and Roles (semantic identifier/name). Af- 
ter having defined all possible relations, the user starts the ‘Autoimport’. Within 
this step, the XML and XPath technology is used to create multiple connection 
between elements out of the starting elments. After a review of the automatically 
created connections by the user, the original XML can be transformed into a ‘raw 
CAEX’. This means that elements of specific CAEX types already exist, but no 
semantic connections between them. This is done in the next step. Roles of an 
existing CAEX role library (either standardized, i.e., by AutomationML or user- 
defined) are assigned to the generated elements. This interlinking must be done 
by the user as the CAEX importer cannot imagine or invent the meaning of an 
element, but it’s possible to apply learning mechanisms to include and extend this 
in the future. Further development focus certainly on the expansion of the user 
assistance. 




Figure 2.3: Semantic CAEX importer 
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After having a system-specific CAEX file, the next goal is to unify the heteroge- 
neous landscape of system-specific data by integrating it into one CAEX ‘world’. 
This is called semantic MES Mapping (SMM). It consists of transformation of the 
existing files as well as fusion of them. One basic tool for this is depicted in Fig- 
ure 2.4 It serves as platform for the integration of different mapping mechanisms. 



which can be applied. These are for instance name or structural equivalence. 
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Figure 2.4: CAEX mapping 



For completing the implementation of the concept for the Provis-Production-Suite. 
A prototypical engineering framework (see Section 0 and Figure |2,5) exists, 
which processes data in the standardized format CAEX describing plants. By 
means of this CAEX information, the engineering of ProVis.Agent® and the gen- 
eration of images for ProVis.Visu® can be done. The whole framework is based on 
OPC UA. It serves as one possible prototype for an automated MES engineering. 



2.6 MES Interoperability Along the Lifecycle 

Future MES have to be completely connected to the Digital Factory following the 
objective of permanent planning. This means that in the case of a change, every 
system involved will be adapted immediately. This takes place during the whole 
lifecycle where MES are supported by online simulation within the Digital Factory. 

Today, the engineering of MES takes place at the end of the plant planning process. 
In most cases, the plant already exists at that time and recognized errors are very 
time and cost-intensive. However, this engineering effort can be shifted into earlier 
phases. MES can be connected to and evaluated against the Digital Factory. In 
doing so, the full range of features of those systems is available at that time. This 
helps to reduce startup time and to improve the engineering process and results. 
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Figure 2.5: Automated engineering framework for MES 



From planning to start-up and operation, tools and methods of the Digital factory 
are applied. Depending on the development status, as much as possible real sys- 
tems are introduced. It is important that real control programs are used as early 
as possible for controlling the model llMew09l . At an early time virtual equip- 
ment and virtual controller (Figure [23] 1) can be linked to the MES. This results 
in an absolute simulation below the MES. In the course of evolution, virtual con- 
trollers can be changed to real controllers (2). This is the so-called hardware-in- 
the-loop simulation. If the equipment with the corresponding controllers is assem- 
bled (3), the MES can be connected to it and operate on it. In this way, engineering, 
evaluation, and optimization can be done step by step. 




Figure 2.6: System combinations 
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Thus, MES close the loop between op eration an d planning. Examples for such 
a virtual startup of MES are given in lISSB 09l . In this domain the IITB man- 
ages the standardization committee ‘Digital factory operation’ (VDI-Gesellschaft 
Fordertechnik Materialfluss Logistik (FML): Arbeitsgruppe ‘Digitaler Fabrikbe- 
trieb’ des Fachausschusses ‘Digitale Fabrik’). This group of industrial and re- 
search partners deals with the development of a VDI recommendation concerning 
the main goals of Digital factory operation, possible domains of application and 
the application in different phases of the lifecycle. The recommendation will be 
published as VDI recommendation 4499 part 2. 



3 Application Examples, Conclusion and Potentials 



To evaluate the developed tools and methods, various application examples have 
been created. For one thing, there are various conveyor systems at hand to validate 
engineering and the generation of simple visualization. They include conveyor 
belts and turntables as well as a test station or welding cell. 

For another, a hierarchical example (Figure UK left) was developed to test the 
layout manager, which consists of different hierarchical levels, different equipment 
aggregates including conveyors, turntables, and test stations. 

To allow for the overall visualization of resources, processes, and products, one 
of the conveyor belt examples was complemented by processes and products (Fig- 
ure right). To this end, a three-view modeling concept was developed, which 
is a prime example for interlinked engineering data. 




Figure 3.1: Hierarchy-based sample application (left) and basic image of the 
sample application for resources, products, and processes (right) 
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Furthermore, two application examples of process industry were modelled to show 
that the developed concepts are also applicable there. The three-view concept 
was also applied there, the underlying data format is AutomationML, which was 
described in Section 2.4 In the first example (see Figure [iO) , a filling and closing 
line for baby food was modelled. The second example consists of a Tetra filling 
plant, which is situated within the liquid filling production hall. The example is 
described in detail in I1SD09II . 




[1 Example_InstanceHierarchy 

H Tetrapack line (Class: Role: Structure} 



H Resources { Class: Role: Resource} 

J Filling machine2 { Class: Role: Filling machine} 

~o Topo { Class: Topology} 

-o PPR { Class: PPRConnector} 

B I IE I Closing machinel { Class: Role: Closing machine} 
-o Topo (Class: Topology} 

-o PPR ( Class: PPRConnector} 

| IE I Pro cesses { Class: Role: Process} 
nTTEl Closet (Class: Role: Close} 

~o PPR { Class: PPRConnector} 

B [Tf] Fill2 (Class: Role: Fill} 

-o PPR { Class: PPRConnector} 

|] Products { Class: Role: Product} 

Iciosedjarl (Class: Role: Jar} 

~ | |E | Fille d jarl (Class: Role: Jar} 

H Empty Jarl (Class: Role: Jar} 

—o PPR { Class: PPRConnector} 

□ Qe] Babyfoodl (Class: Role: Baby food} 

-s PPR (Class: PPRConnector} 

•O PPR { Class: PPRConnector} 

0 [JE] Udl (Class: Role: Lid} 

~o PPR (Class: PPRConnector} 

-o PPR ( Class: PPRConnector} 






Figure 3.2: Application example of process industry (1) (plant image source: 
Nestle Deutschland AG) 



Semantic interoperability between facilities and superordinate IT-systems inside 
the factory becomes the major prerequisite for the future of adaptive manufactur- 
ing. Open questions are: how will a corresponding standard would look like, will 
there be lots of ‘standards’, e.g., standards related to different branches of industry, 
company internal standards or will the manufacturing and automation community 
be able to design a common standard for various branches and equipment? 

The aspects and examples pointed out in this paper illustrate that interoperability of 
MES is not completely realized, but possible. The shown approaches are possible 
steps towards this goal. 

Moreover, new potentials evolve. Some of them are an automatically generated 
documentation and order lists, the calculation of the energy consumption of a 
plant by integration of additional slots within the data representation or online 
simulation to support decisions concerning unexpected changes on the shop floor. 
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Abstract: In many machine vision applications for automated inspection the 
illumination design is crucial to the robustness and speed of the inspection 
process. Therefore, there is need to investigate and experimentally evaluate 
new illumination designs and techniques. We briefly review a representative 
selection of illumination techniques that aim to minimize the effort of de- 
fect detection by adapting the illuminating light field to the nominal state of 
the inspection task. Based on this principle we propose an illumination tech- 
nique using a projector-camera system which provides inspection images that 
directly display differences in reflectance between two scenes. 



1 Introduction 

The choice of an appropriate illumination design is one of the most important 
steps in creating successful machine vision systems for automated inspection tasks. 
Since in image acquisition all information about a scene is encoded in its exitant 
light field, the incident light field provided by the illumination must be able to 
reveal information relevant to the inspection task about the test object. More- 
over, in real-time machine vision applications where time is a major constraint, 
appropriate illumination can greatly simplify digital image processing tasks and 
improve their processing time and reliability. For instance, via an illumination that 
results in images with high contrast between object and background, simple image 
thresholding may suffice for object segmentation and more sophisticated and time 
consuming algorithms can be avoided. 

While there are well-founded design rules for choosing the imaging optics for a 
machine vision system I1BH85I . rules for illumination design are less elaborated. 
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However, one general design objective is to provide an illumination that accentu- 
ates the features of interest, such as surface defects of a faulty test object, while 
it minimizes distracting features, e.g., flawless object regions. Some popular il- 
lumination techniques like dark-field illumination BGre04B and techniques based 
on polarized light IISUW071 explicitly take care of this demand. In dark-field il- 
lumination, a directional illumination is used to enhance the visibility of surface 
features like scratches or indentations on an otherwise smooth surface. In the pres- 
ence of a perfectly smooth surface, which is assumed to be the desired nominal 
state of the inspection task, all incident light is reflected away from the camera’s 
lens and the captured image is dark. Imperfections on the surface scatter light into 
the lens and appear as bright image features. Similarly, in an illumination setup 
where a polarized illumination ( polarizer ) is used in combination with a crossed 
polarizing light filter in front of the camera (analyzer), object features with polar- 
izing properties can be highlighted. Viewing the test object in its desired nominal 
state where it is assumed that no change in polarisation occurs, the light received 
by the camera is almost completely attenuated by the analyzer and thus a dark in- 
spection image is captured. However, object features that cause a rotation in the 
angle of polarization, like unwanted refractive index variations or stress in trans- 
parent objects, are captured as bright image features. The principle behind both 
illumination techniques is to generate inspection images that highlight deviations 
from a predefined desired nominal state with high contrast. 

Recently, some novel inspection methods have been proposed that take the afore- 
mentioned principle a step further by employing more sophisticated illumination 
techniques. Techniques like inverse fringe projection llLBOK04tllCS07l . inverse 
patterns for deflectometric inspection IIWB07al and comparative digital hologra- 
phy llBOKJl are able to directly highlight differences in the shape of two objects. 
Defect detection by comparing the actual state of a test object with the desired 
nominal state of a master object is a standard task in industrial inspection. In the 
case of fringe projection and deflectometry, the shape information of a preceding 
measurement is used to computed an inverse structured light pattern of the mas- 
ter object which is then used to evaluate a test object. If master and test object 
are identical, a predefined undistorted pattern is obtained. Otherwise, shape dif- 
ferences are directly highlighted by local geometric distortions in the projected 
pattern. In the case of holography techniques for object comparison, the coherent 
optical wave field of the master object is obtained by digital holography. By illu- 
minating the test object with the coherent mask of the master object, differences in 
the shape between the two objects are directly displayed in the inspection image. 

All the described inspection techniques have in common that the illuminating light 
field is adapted to the desired nominal state of the inspection task. Hence, the cap- 
tured inspection images directly highlight deviations from the nominal state and 
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therefore reduce the effort of defect detection through digital image or signal pro- 
cessing. This means that feature extraction for defect detection partly takes place 
in the optical domain, that is, during image formation. We believe that this illumi- 
nation principle is a promising technique for fast and robust industrial inspection 
tasks and is worth to be investigated in more detail. 

In this technical report, we apply this principle to propose an illumination tech- 
nique that is designed to highlight the difference in reflectance and shape of two 
objects or scenes. Our goal is to provide a spatially adapted illumination pattern 
that results in a featureless flat-gray inspection image if the illuminated test ob- 
ject is identical to the master object. However, differences in reflectance of the 
two objects, e.g., caused by defects, should result in detectable image features that 
highlight the faulty areas. This goal is archived by utilizing a digital light projec- 
tor as light source and a technique referred to as radiometric compensation in the 
literature IINPGB03I . 

In the following sections, we describe the setup of a projector-camera system that 
is able to generate such illumination patterns and explain the radiometric com- 
pensation methods used in our experiments. Moreover, experimental results are 
presented and the advantages of the proposed illumination technique compared to 
conventional techniques for deviation detection are discussed. In the following, 
the adapted illumination pattern that results in a flat-gray camera image is referred 
to as the inverse illumination mask of a scene or object. 



2 Inspection Image Formation by Means of Radio- 
metric Compensation 

In recent years, digital video projectors gained great interest in the computer vision 
and graphics community due to enormous technological advancements such as 
higher spatial resolution and dynamic range IBIWCOSIIRI AVOOl IGTLL06I . Since 
the radiance of each projector pixel can be controlled separately, they are ideally 
suited as experimental platforms for evaluating new illumination techniques. In 
combination with a light sensing device such as a camera, systems that utilize digi- 
tal video projectors as controllable illumination are referred to as projector-camera 
systems in the literature. Projector-camera systems allow to project arbitrary com- 
plex illumination patterns onto a scene and to capture the corresponding images. 
These optically coded images allow computing information about the scene which 
is not possible to be retrieved with standard illumination techniques. 

Computing the inverse illumination mask of a scene is closely related to the prob- 
lem of radiometric compensation, which has been widely discussed in the field of 
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Projector 




(a) Schematic of a coaxial projector- 
camera system 



(b) Prototype system 



Figure 2.1: A coaxial projector-camera system. 



projector-camera systems llNPGB03lllWB07b1lllAOSS06ll . Radiometric compensa- 
tion deals with the problem of displaying projected images on arbitrary surfaces 
with varying color, reflectance and geometry. When an image is projected onto 
such a non-cooperative surface, the appearance of the image is modulated by the 
spatially varying reflectance and distorted by geometric variations. To be able to 
preserve the original intended appearance of the projected image, a camera serves 
as a proxy for the human viewer and provides information on how the projected 
image has to be compensated prior to projection in order to account for the afore- 
mentioned perturbations. For our purposes, radiometric compensation is used to 
compute the compensated flat-gray image for a given test scene. Then, the com- 
pensated image is exactly the inverse illumination mask that has to be project onto 
the scene in order to capture a flat-gray inspection image. 



2.1 Projector-Camera Correspondence 

For radiometric compensation a precise mapping between camera and projector 
pixels has to be established. To avoid parallax errors, a coaxial projector-camera 
setup, as proposed in I1FGN05I . is chosen. As shown in Figure [27T] a beam-splitter 
is put in front of both the camera and the projector optics. In addition, the cam- 
era is attached to an assembly of mechanical positioners, which allows a precise 
linear and angular positioning of the camera. By a manual calibration procedure, 
both centers of field of view are aligned so that a scene-independent geometry is 
yielded. 
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However, a pixel-wise alignment cannot not be established in a purely optical man- 
ner. Hence, a geometrical mapping between points x p = ( u p , v p ) in the projected 
image and points x c = ( u c ,v c ) in the captured camera image is introduced. For 
this purpose, piecewise second-order polynomials for modeling this mapping are 
used as proposed in HNPGB03I . The polynomial model for each piece of the image 
domain can be written as 

x p = Ax c , where x c = (u 2 c , v 2 c , u c v c , u c , v c , 1) T , (2.1) 

where the matrix A £ R 2x6 contains the unknown coefficients for the mapping. 
These coefficient are computed by the least-squares-error fitting method using 
corresponding points from camera and projector images. Corresponding point 
pairs are obtained by projecting and capturing a sequence of binary coded mark- 
ers, which are then extracted by image thresholding and connected component 
analysis. 

With ( |2l) , for each camera pixel the corresponding projector pixel can be deter- 
mined. However, the calculated pixel positions usually do not fall into the inte- 
ger grid of the input image of the projector. Hence, the value of each projector 
pixel has to be determined by interpolation, taking neighboring pixel values into 
account. Due to the well know problems of geometric forward image transfor- 
mations IWol94l . backward transformation is used to geometric transform and 
resample the image prior to projection. The needed inverse geometric mapping is 
obtained by swapping input and output points and fitting the polynomial model as 
described above. When the geometric backward transformation is applied to an 
input image, the projected image appears undistorted to the camera and the cap- 
tured image matches with the original image in an almost pixel-wise manner (see 
Figure [T2| . 



2.2 Physics-Based Modelling for Radiometric Compensation 

Most approaches to radiometric compensation found in the literature are based 
on the inversion of a radiometric model that describes the image formation in a 
projector-camera system via a projection screen with spatially varying reflectance 
llNPGBQ3lllGPNB04lllBEK05allllAOSS06ll . The model parameters for a static 
scene are computed by projecting and capturing a set of calibration images. This 
allows to compensate arbitrary images before projection onto the same static 
scene. However, whenever the scene changes, all model parameters have to be 
recomputed. 
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Figure 2.2: Geometric transformation of a test projector image (a) and the cor- 
responding captured camera image (b). The geometric transformation of the 
input image accommodates for geometric distortions that may be caused by a 
misalignment of projector and camera. 



Figure [23] illustrates the information flow of our projector-camera system and in- 
troduces the relevant photometric quantities. Note that the illustrated model ap- 
plies only to a single scene point and projector and camera are restricted to a sin- 
gle spectral channel. However, the model can be applied to any scene point and all 
combinations of projector and camera channels. 

Let K € {i?, G, B} be a color channel of the projector with spectral response 
wr{ A) where A denotes wavelengths. The scalar pixel value lx in the projec- 
tor input image is mapped by the projector intensity transfer function px by the 
electronics of the projector to a projector brightness 

Pk = Pk(Ik) ■ 

The function px is assumed to be non-linear and monotonic. In addition, it is as- 
sumed to be spatially invariant but different for each color channel. By modulation 
of the projector brightness Px with the spectral response w the irradiance 

ExW = PrwkW 



is produced and illuminates the scene point with spectral reflectance s( A) in the 
viewing direction of the camera. Then the radiance of the scene point in direction 
of the camera is 



Lj{ A) = Pxwx{^)s(\) , 
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Figure 2.3: Information flow and radiometric quantities for a projector-camera 
system. 

which is measured by the camera channel J G {R, G, Bj with spectral response 
function qj( A). The irradiance detected by the camera’s sensor is then 

Cj = Pr J WK(X)s(\)qj(X) d A , 

which is further processed by the electronics of the camera to produce the scalar 
pixel value 

Mj = mj(Cj). 

The function mj which relates sensor irradiances to image pixel values is referred 
to as camera transfer function and is assumed to be non-linear and monotonic. 
Furthermore, it is assumed to be spatially invariant but different for each chan- 
nel. With the physical model for a single channel, the model can be extended to 
three color channels for projector and camera, i.e., J, K G {if, G, B}. In matrix 
notation this yields 



C =VP, 



(2.2) 



where 



/C R \ (V RR V RG V RB \ (P R \ 

C = \ c G , V = [v GR V gg Vgb ,p = P G 
\C B ) \V B r V bg V bb J \PbJ 



Copyr. 
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and 



Vrj = J wi<(X)s(X)qj(X) dX . 

This is the linear part of the radiometric model which describes the spectral cou- 
plings between the projector and the camera channels and their interactions with 
the spectral reflectance of the scene. Note that the spectral responses of projector 
and camera are unknown but are typically broad and overlapping and thus, they 
mutually interfere with each other. 

The nonlinear part of the model is expressed by the projector intensity transfer 
function 

P(I) = P, 

where p(I) = (pr(Ir),Pg(Ig)iPb(Ib )) T , and camera transfer function 

m(C) = M, 

where 111 (C) = (mR(CR),mG(Ca),‘mB{C'B)) T ■ The camera transfer function 
is required by all computer vision tasks that need to measure scene radiance, e.g., 
photometric stereo or shape from shading, and its recovery is therefore a well stud- 
ied problem HGN03I . In our work, the approach described in IIMN99II is applied 
where images of a static scene are captured with different exposures and samples 
of the camera transfer function are implicitly collected. Then, this discrete function 
is inverted to obtain discrete samples of the inverse camera transfer function 

m _1 (M) = C , (2.3) 

which is needed to map measured camera pixel values to scene radiances. 

The intensity transfer function of the projector can normally be obtained by a 
spectroradiometer, but this device was not available for our experiments. How- 
ever, the off-line calibration procedure described in ||NPGB03|| allows comput- 
ing the model parameters V without any prior knowledge regarding the projector 
intensity transfer function, provided that the transfer function of the camera is 
known. The calibration procedure is based on two steps as follows: In the first 
step, the model parameters in matrix V are computed for each pixel. For this 
means, the diagonal entries Vkk are treated separately by defining the diagonal 
matrix D = diag(V/j/j. VgGi Vbb)- Then the matrix V = VD _1 is defined so 
that Vkk = 1. which models only the mixing between unlike color channels. To 
determine the 6 model parameters in V, 4 images with constant color have to be 
projected and captured. For more details on this procedure see HGPNB04I . 
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In the second step, the non-linear projector intensity transfer function is estimated 
for each pixel. Substituting V = VD in equation \2.2\ and multiplying with V -1 
from left yields 

V X C: = DP . (2.4) 

= :C 

Then, since C and P are vectors and D is diagonal, ( |2.4| l can be written element- 
wise as 

Ck = Vkk -Pk(Ik) , (2.5) 

for each channel K € {R, G , B}. Equation ( |2.5| i shows that Gk only depends on 
the input pixel values Ik of channel K and is independent of the other channels. 
By defining p'k(Ik) ■= Vkk ■ PkUk) the unknown scaling factors Vkk are 
included in the unknown camera transfer function p' K and equation ( |2.5| ) can be 
written as 

Gk = p'k(Ik) • 

This means, the function p' K can be sampled by projecting constant images for 
each projector pixel value Ik = 1, . . . , 255, capturing the corresponding images 
Mk and determining Gk via equation ( |2.3| > and \2A\ . The samples are then in- 
verted in order to obtain the inverse projector intensity transfer function p ,_1 (see 
INPGB03I for a detailed explanation on this procedure). A similar method for 
recovering the inverse intensity transfer function is presented in I1GPNB04L but 
there only two calibration images instead of 260 are required. 

Once all model parameters have been determined off-line for a static scene by the 
calibration procedure, a compensation image can be computed on-line for any de- 
sired camera image. For each desired camera pixel value M = (Mr, Mq, Mr ) T , 
the corresponding projector pixel value I = (Ir, Iq, Ib) T can be computed 
according to 

I = p/- 1 (v- 1 m _ 1 (M)) . 

Note, since the radiometric model is only valid as long as the scene remains static, 
the calibration procedure must be repeated making this method inefficiently when 
experimenting with different scenes and objects. Furthermore, problems in recov- 
ering the projector non-linearity introduced by the intensity transfer function were 
encountered in our experiments. Both calibration methods proposed in llGPNB04ll 
and HNPGB03I were attempted to determine the relationship between camera ir- 
radiance and projector input, but with limited success. Both approaches yielded 
samples of the intensity transfer function, but however, which were inappropriate 
for inversion due to strong noise. 
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2.3 Error-Feedback Approach To Radiometric Compensation 

In ||FGN05|| and [|NPG B03H , a method for radiometric compensation using the 
error between the desired and measured appearance of the projected image is pro- 
posed. The appearance of the projected image is continually measured by the 
camera and the computed error is used to adapt the projected image to meet the 
desired appearance. Let I be the desired original image and M(f) the correspond- 
ing measured image when 1(f) is projected. At f = 0, the algorithm starts by 
projecting 1(0) = I. Then the adapted compensation image for time f + 1 can be 
computed as 

I(f + 1) = 1(f) + cr(M(f) — I), 

where a E (0, 1) is a gain factor and addition is defined component-wise for 
each color channel separately. By setting I to a flat-gray image and let the error- 
feedback algorithm converge to a small constant error, I becomes the inverse il- 
lumination mask. Typically, the algorithm converges after a few iterations (see 
Figure [3~.l(d)} and is therefore ideally suited for dynamic scenes. 



3 Experimental Results 



Due to the aforementioned problems associated with the model-based approach, 
the error-feedback method presented in Section 2.3 is utilized to compute the in- 
verse illumination mask of a test scene. Figure 3. 1 summarizes the results obtained 
for a scene that consists of a flat colored background and a gypsum bust in the fore- 
ground. Figure 3.1(a) shows the measured camera image when a flat-gray image is 
projected. This resembles a coaxial bright-field illumination of the scene. Figure 



3.1(b) shows the inverse illumination mask obtained by the feedback algorithm 



after 15 iterations. By projecting this image onto the scene, the camera measures 
a nearly constant gray image (see Figure 3. 1 (c) I. Figure 3.1(d) illustrates the root 
mean square (RMS) error HGW08I between desired flat-gray and actual measured 
camera image for each iteration. As clearly can be seen, the RMS error decreases 
rapidly and after approximately five iterations no further decrease of the RMS error 
is observable. 



To evaluate the feasibility of the inverse illumination method for inspection tasks, 
the test scene is modified by small paper patches to simulate deviations from the 
desired nominal state of our scene. Then, the actual (modified) scene is illuminated 
by the inverse illumination mask of the nominal scene and an image is captured. 
The results are summarized in Figure [372] As expected, deviations from the nom- 
inal scene, which are simulated by colored paper patches, are clearly highlighted 
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Figure 3.1: Experimental results of the error-feedback algorithm, (a) Image of 
scene when a flat-gray image is projected, (b) Inverse illumination mask computed 
by the error-feedback algorithm, (c) Measured camera image when the inverse 
illumination mask is projected onto the scene, (d) RMS error between desired 
flat-gray image and actual measured camera image for each iteration of the error- 
feedback algorithm (red, green and blue indicate the error for each color channel). 



by the inverse illumination mask. Since deviations in reflectance and shape are di- 
rectly displayed in the captured inspection image, the effort to detect these can be 
reduced. For instance, conventional image segmentation techniques based on color 
thresholding I1GW0811 may be used to identify the faulty regions in the inspection 



image (compare Figures 3.3(a) and 3.3(b) I. 



Obviously, similar result like in Figure 3.1(d) can be obtained by capturing im- 
ages from the nominal and actual (modified) scene under conventional bright-filed 
illumination and by subtracting the images from each another to get a difference 
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(c) (d) 



Figure 3.2: Inverse illumination to highlight deviations from the desired nominal 
state of a test scene, (a) Nominal state of the scene with its inverse illumination 
mask shown as inset, (b) Modified scene by paper patches, (c) Original scene and 
(d) modified scene illuminated with the inverse illumination mask of the nominal 
state of the test scene. 



image. However, since the usable dynamic range of digital cameras used in ma- 
chine vision is much smaller than the dynamic range associated with most techni- 
cal scenes, saturation and underexposures in images are common. Consequently, 
by controlling the exposure of the camera so that the brightest scene point is cap- 
tured without saturation, image regions corresponding to low radiance are captured 
with low signal-to-noise ratio (SNR). 

By illuminating a scene with its inverse illumination mask, the exposure of 
each pixel on the camera sensor is adapted independently. The notion of spa- 
tially varying pixel exposures was first introduced in IlNMOOl at the cost of 
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Figure 3.3: Segmentation in RGB color space by enclosing data regions, (a) Color 
distribution of the image pixels in Figure 3.2(c) Each image pixel corresponds to 
a vector in the RGB color space. The pixels are densely clustered and are enclosed 
by a bounding box which is centered on the color gray, (b) Color distribution of 
the image pixels in Figure 3.2(d) Points outside the bounding box belong to image 
regions that mark deviations from the nominal state of the scene. 



reduced spacial resolution. Some novel approaches have been introduced in 
I1NB03|||NBB04 ||||mSM + 071 where different optical attenuators like liquid crystals 
are used to control the incident irradiance of each camera pixel. In our approach, 
the exposure of each camera pixel is controlled indirectly by changing the irradi- 
ance of the corresponding scene point, with the result that all scene radiance values 
are brought to the same predefined measurable camera pixel value. Consequently, 
an inspection image with spatially constant SNR is acquired. Furthermore, by 
computing the inverse illumination mask that corresponds to an optimally exposed 
flat-gray inspection image just below saturation, deviation detection is performed 
in the image signal region with highest SNR. 



4 Summary 

An illumination technique that directly highlights deviations in reflectance be- 
tween two scenes is proposed. This is achieved by generating an inverse illu- 
mination mask of the desired nominal state of the scene which “neutralizes” the 
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appearance of the nominal state to a flat-gray image. By this means, inspection 
images with high and constant SNR are obtained that can render the subsequent 
digital image processing tasks for defect detection more efficient and reliable. In 
future works, we shall investigate the common principle and possible advantages 
of scene adapted illumination techniques more formally and attempt to apply this 
principle to develop novel illumination techniques. Furthermore, projector-camera 
systems in general will be utilized and investigated as programmable light sources 
for machine vision applications. 
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Abstract: Gaze data contains valuable information about user’s cognitive 
processes during execution of a task. In order to use this information, e.g., for 
studying user’s strategies or for designing new gaze-based interaction tech- 
niques for HCI, gaze data needs to be aligned with the task executed by the 
user. 

In this paper we propose a novel framework based on the theory of 
Markov Decision Processes for putting gaze data into context, allowing for 
automated interpretation of gaze position and movement with respect to the 
task performed by the user. The model can be used for offline analysis of gaze 
data, e.g., for studying gaze behavior, as well as for online interpretation for 
realizing new interaction techniques. We evaluate the proposed model with an 
indirect object manipulation task and demonstrate how it can be used for in- 
tention recognition and/or detection of a mismatch between the mental model 
build by the user and the real system. 



1 Introduction 

Visual perception is an important information channel during manipulation of real 
or virtual objects (e.g., icons on a graphical user interface). It allows for per- 
ceiving the current state of manipulated objects and/or for visuomotoric control 
of manipulators like our hands or a computer mouse. During manipulation tasks, 
our gaze behavior is mainly controlled top-down and subconsciously by cognitive 
processes which are responsible for task execution. Therefore, natural gaze behav- 
ior provides a window into the human mind and allows a conclusion to be drawn 
about user’s intentions and cognitive processes. This information could be very 
valuable for future adaptive and proactive human-computer interfaces (HCIs). 
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However, natural gaze behavior is very complex and is influenced by avast amount 
of factors. Therefore, in order to allow for integration of natural gaze as an 
additional modality into HCIs the following steps need to be undertaken: 

1. A thorough understanding of natural gaze behavior in dynamic environ- 
ments needs to be established. 

2. Methods for automated online analysis of gaze data in the context of 
dynamic environments need to be developed. 

3. New multimodal gaze-supported HCIs need to be designed, implemented, 
and evaluated. 

All of the three points above have been already covered by a large body of re- 
search. Natural gaze behavior has been studied in different natural environments 
(e.g., during block-copying HPHL01I . basic object manipulation lUWBFOll , driv- 
ing I1LL94I . and playing cricket IlLMOOl ) as well as during human-computer in- 
teraction (e.g., IlSHAZOOl ) and on the field of psychology and physiology (e.g., 
I1GBOF08I IFJ03ID . However, the results of these studies in the form of different 
gaze behavior observed during task execution mostly are reported in an informal 
way, e.g., as verbal descriptions or as plots of gaze data. Such descriptions can 
help to improve the principal understanding of gaze, however, a more formal de- 
scription of different gaze behaviors in different contexts is required in order to 
make results comparable and accessible for automated interpretation. 

General methods for automated online analysis of gaze data currently are mainly 
limited to fixation detection IlSGOOl IS AO 1 1 and analysis of fixation frequency, 
duration, and position (e.g., IISAOOll ). Alignment of gaze data with the task or 
cognitive processes often is done manually or is restricted to static environments 
(e.g., IlSGOOl [HMR03I [SAP 1 1 ). In order to develop new methods for interpreta- 
tion of natural gaze behavior in arbitrary dynamic environments a common formal 
framework would be very helpful. 

Most state-of-the-art gaze-based interfaces use gaze as an explicit pointing device, 
e.g., as a replacement for a mouse IlLanOOII . This requires gaze to be used for ma- 
nipulation (e.g., for pressing keys on a virtual keyboard IlLanOOII ) in addition to 
its natural purpose, namely visual perception. Such interaction techniques might 
be useful for certain applications, e.g., when hands are not available as an input 
modality. However, using gaze-based pointing as a general input technique for 
human computer interaction has many limitations (see IIBVK09IB . Promising ex- 
amples for gaze-supported interaction techniques are presented in IHMR03I and 
IIZMI99I . In both approaches natural gaze behavior is analyzed and the user is 
not forced to diverge from that natural behavior for interaction purposes. iDict 
IIHMR03II analyzes the duration of fixations while the user reads a text in a foreign 
language and automatically provides a translation of the fixated word if a longer 



A Framework for Analyzing Natural Gaze Behavior 



81 



fixation is detected. In the approach ’’Manual And Gaze Input Cascaded (MAGIC) 
Pointing” lIZM 19911 the mouse pointer is placed close to the currently fixated object 
in order to eliminate a large portion of the cursor movement. Both approaches do 
not use gaze directly as pointing or input device, but interpret gaze data in the con- 
text of the task (reading, pointing). However, the link between gaze and cognition 
in none of the two approaches is made explicit, e.g., in form of a model. This limits 
generalization and development of a deeper understanding of the underlying prin- 
ciples of such techniques. The need for modeling the dependencies between gaze 
and the task was also already stated by other researchers (e.g., IlSAOOl suggest to 
use tools from cognitive modeling). 

In this paper we propose a new framework based on the theory of Markov Decision 
Processes (MDPs), which can provide a common ground for all of the three above 
mentioned steps towards adequate interpretation of natural gaze behavior in dy- 
namic environments and usage as additional modality in future HCIs. In particular 
we consider the following aspect as important to be covered by such a framework 

• Uncertainty of knowledge of the user about the system which is interacted 
with seems to play an important role for explaining and interpreting natural 
gaze behavior I1BVK09I . In contrast to traditional approaches to cognitive 
modeling of tasks like GOMS llJoh90l the proposed framework explicitly 
allows for modeling uncertain knowledge. 

• Multiple strategies may lead to a desired goal when interacting with a sys- 
tem. Natural gaze behavior therefore can be influenced by more than one 
of those strategies and by the choice between them, respectively. In the 
propose framework multiple possible strategies can be explicitly modeled 
and/or generated automatically. 

• Simplicity of the framework is important for keeping it applicable for the 
above mentioned steps. Even if important components of cognition like 
short-time memory are not explicitely covered by the framework, it pro- 
vides a good basis for investigating and interpreting natural gaze behavior 
for many tasks. 

The framework proposed here is an extension of the work presented in I1BVK09I . 
where fundamental causal relations between task execution and gaze behavior are 
discussed and modeled in a probabilistic framework. 



2 Framework for alignment of gaze data 

In this section we propose a formal framework for modeling the interdependencies 
between gaze data and the task executed by the user. We first derive the formal 
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model, then show how gaze behavior can be classified into different categories 
related to the current state of task execution. Finally, we illustrate what can be 
further inferred from the model regarding intention recognition and detection of 
model mismatches. 



2.1 A formal model of an interactive system and its mental 
representation 

The human decision process during interaction with an interactive system con- 
sisting of a number of interactive objects can be modeled by the 4-tupel 
(Q°,X, P(Q° +1 \Q°, It), c( • )°), which defines an MDP. We denote Q° as the 
set of all possible states of an object. I is the set of possible inputs or ac- 
tions which can be performed by the user in order to change the system state. 
P(Qt+ 1 = Qt+ilQt = Qtilt = it) denotes the probability of a state transition 
of object o from state q° to state q° +1 if input it € 1 is performed by the user. 
The function c defines costs for each system state and state transition, respectively. 
These costs can reflect relevance of certain system states for the success of the 
whole task, but also cognitive or physical load induced by certain inputs. Without 
loss of generality, in the following we will focus on one single object and therefore 
omit the index o in the formulas above. 

Usually, state transitions in a technical interactive system, in contrast to those in 
natural ones, are deterministic. Hence, the state transition probabilities P T for 
most technical systems reduce to 

P T {Qt+ 1 = Qt+i I Qt = <7t> h = h) = 1 and 
P T (Qt + 1 ^ <?t+ 1 I Qt — q t , It — it) = 0 , 

where q denotes the state induced by input i t . However, the mental model 
of such a deterministic system can be incomplete and/or uncertain. Since gaze 
behavior is determined by this mental representation of the interactive system P AI 
and not by the real one ( P T ) we chose to use a probabilistic framework in order 
to allow for modeling of effects such as a model mismatch (P jl/ ^ P T ). The 
different basic components of the model are illustrated in Figure [2TT| 

When interacting with a system, the user mostly has a certain goal in mind. He/she 
wants the system to take a certain target state, which has to fulfill some require- 
ments. We therefore model a goal as a subset of system states Q C Q with the 
indicator function 



1 if q e Q 
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Figure 2.1: Basic components of the model. 



Such a goal for example can be the selection of a set of objects or their positioning 
in a target area on the screen. A complete task can consist of multiple subgoals, 
which have to be reached subsequently. 

To convey an interactive system from its initial state q° into a target state q-’ 6 Q, 
the user has to execute a sequence of actions. In most tasks not only one action 
sequence leads to the target state, but many different “ways” can be chosen by the 
user. The different policies the user can follow are described as a set of functions 



7tf (q) '■ Q I, iml,...,N w , 



where the function 7r f(q) specifies the action the user will choose according to 
policy i when the system is in state q and the goal is Q. N v is the number of 
possible policies. 

Given a certain task with an initial state q° and a goal Q the user has 
to decide, which actions are to be executed in order to reach the goal. 
P AI (Qt+l \Qt = 9t> = h) reflects the knowledge the user has about the sys- 

tem, namely which effects a certain input it has on the state of an object and the 
system, respectively. If the user had perfect knowledge of the system {P M = P T ), 
he could calculate the set of optimal policies for reaching the goal according to his 
internal value function c, describing costs for executing the different inputs and the 
amount of reward for reaching certain system states. If the user had no knowledge 
of the system, he could have the same value function c but would not be able to 
calculate any policy, since the effect of actions to the system state would be un- 
known. Therefore, in this case the user would have to build a mental model of the 
system previous or in parallel to the execution of the primary task. 
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2.2 Two reasons why we look where we look 

In this section we analyze what kind of natural gaze behavior would be predicted 
by the above model. Generally, we have two reasons to draw our visual attention 
on a certain location in an interactive environment. 

The first one is control of input i. If the human operator once has decided to 
execute a certain action in order to reach the goal, it is important to perform this 
input as accurate and fast as possible. Since absolute positioning of limbs by 
only proprioceptive feedback is not very accurate 1BRS03II . vision is needed as 
additional feedback channel for accurate position, e.g., of the hand. An input 
can be either controlled directly by observing the input device or body part, or 
indirectly by observing the system reaction. 

The second reason for drawing our visual attention on a certain location is the 
verification of system reactions or states. If we had a perfect mental model of 
the system and complete and accurate non-visual feedback about our actions i, 
we could execute the task successfully with closed eyes. However, neither the 
first nor the second assumption is realistic. In most cases the mental model is 
incomplete or uncertain, and, as already mentioned above, non-visual feedback 
channels (e.g., proprioception or touch) are not accurate enough. Therefore, in 
order to create, improve or verify the mental model of the system the user has to 
check permanently, whether a certain input leads to the anticipated system reaction 
or not. The more confident the mental model is, the less verification of the system 
reaction is required. 

For further investigations of these two processes, which concurrently influence 
natural gaze behavior during human-computer interaction, alignment of gaze data 
with the task and system state is required. 



2.3 Methods for alignment of gaze data 

To align gaze data with the model presented above we use both, absolute position 
and movement of gaze. Typically gaze movements are separated into two differ- 
ent components: fixations and saccades. While saccades are rapid eye movements 
used to locate the gaze at a certain position, gaze remains almost still during fixa- 
tions to enable retrieval of visual information. Two different algorithms have been 
implemented for automated fixation detection. Both algorithms “I-DT” (Disper- 
sion-Threshold Identification) and “I-VT” (Velocity-Threshold Identification) are 
taken from IlSGOOI . The first algorithm clusters gaze points according to their 
spatial distribution, the second one according to the velocity of gaze movements. 
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In the following we present methods for interpretation of gaze data in an inter- 
active environment with visual representation of object and system states on a 2- 
dimensional planar display. The state of an object in such an environment can be 
described by q t = (p t , at), where p t £ N 2 is the position of the visual represen- 
tation of the object on the display and a t describes further attributes of the object. 
Fixation positions are further denoted with f t £ N 2 . In order to analyze gaze 
positions in the context of a task, we calculate the difference vector Vt = f t — p t . 
Additionally, we define the vector w\ = Pt+i — p t with 

Pt + 1 = argmax P T (Qt + l = (p t+ 1 , «t+i)|<?t = q t , h = ^i(q t))- 

Pt + 1 

as the most probable next object state given a certain policy tt.;. w\ also could be 
calculated by not only considering one but cumulating multiple steps of the policy 
until a certain horizont of predicition. 

Every fixation further is classified into one of the following categories: 

P f ’ 1 ■ (||«i|| > V a )A (/( V t ,w\) < (3 max) 

O f ' 1 : (|M| < v 0 ) A (w\ = 0) 

N f : (II Vf || > v 0 ) A (Vi : Z(v t , w l t ) > /3 max ) 

O f : (|| vt || < v 0 ) A (Vi : w\ ± 0) 

Fixations of category P?’ 1 are proactive with respect to policy 7tj. Category O^ 1 
contains fixations on objects along a certain policy with no changes in object posi- 
tion, while describes all other fixations on an object. N? contains all fixations 
belonging to none of the other categories. 

The criteria for the different categories only depend on system states at one single 
point in time. The time delay for fixation classification induced by a previous 
approach presented in IIBVK09II can therefore be avoided. 

Taking only absolute fixation positions into account for gaze analysis may lead to 
problems if gaze position measurement is subject to drift, as it is often the case 
with current eye tracking hardware. Therefore, we also classify gaze movements 
according to their spatial relation to object and target positions as well as to possi- 
ble policies. We define saccades as A f t = f t — f t _i and assign them to one of 
the following classes: 

P S,i = (ll*t|| > K-tH) A (Z(A f t ,w\) < Pmax) 

P s : (||« t || > || vt— 1 1|) A (Vi : Z(A f t ,w\) > j3 max ) 

R s : (II Vf || < || vt— 1 1|) 
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Categories P s ,l and P s contain proactive, R s reactive saccades. Too small sac- 
cades or those ones too far away from an object are filtered out by the conditions 
IIA/J P A f min and Hi^tH ^ Vg V Il'Wf — 1 || ^ tig, where is the minimal 

length of a saccade and v s the maximal distance to an object. 

Proactive gaze behavior which can be assigned to a certain policy (categories pf' 1 . 
Of’ 1 and P s ’ 1 ) can be used for estimating user’s intention. Especially for tasks 
with multiple possible policies the policy which will be chosen by the user could 
be identified previous to any object movement, allowing for designing new proac- 
tive interaction techniques. Additionally, knowing which policy is chosen by the 
user according to his gaze movements, a model mismatch can be detected if a dif- 
ferent policy is actually executed. In order to reduce training time and to resolve 
the model mismatch a hint could be displayed to the user in such a case. More 
generally speaking, the interface can automatically adapt to novice or expert users. 

Reactive gaze movements are not of value for intention recognition, since they do 
not convey any additional information not already available through observation of 
system state changes. However, a high amount of reactive fixations might indicate 
uncertainty of the user. 



3 Evaluation 



3.1 Participants and Task 



We evaluated the proposed model with a small user study. Four participants were 
asked to perform an indirect object manipulation task as fast as possible. The goal 
was to move a point from its initial position into a target area as shown in Figure 
3. l|[a)| We distinguish between four types of tasks ( A,B,C,D ) depending on the 



location of the target area on the screen (see Figure 3.1 (b)) . Subgoals for the two 
policies with a minimal number of changes in movement direction of the object 
were indicated for each task as small dots. 



3.2 Apperatus and system model 

The size of the display is 12.1 inches with a resolution of 1024 x 768 pixels. As 
input devices we use one single key of a keyboard and a pen tablet, while only 
horizontal movements of the pen on the tablet were used. Hence, we obtain the set 
of possible inputs I = {0, 1} x N with elements i = (i k , i d ), where i k € {0, 1} 
indicates whether the key is pressed (1) or not (0) and i d £ N is the relative 
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(a) Task (b) Mapping of inputs to system reactions and 

target quadrants A,B,C and D 



Figure 3.1: Experimental task (a) and 



manning of innut devices 



movement of the pen in horizontal direction to the left (< 0) or to the right (> 0). 
The position of the point after a state transition is defined by 



f Pt + if- «o if *t = 0> 

1 Pt+it-ai if it = 1, 



where a o and a\ are the two possible movement directions of the object. The 
mapping between inputs and system state transitions is graphically illustrated in 
Figure [37 l|[b)| The mapping was chosen to be distinct from any standard mapping 
potentially known by the participants in order to being able to study gaze behavior 
under uncertain mental models P M ^ P T of the users. 



Figure 3.2 shows the two policies 7Ti and 7T2 for a task of type D with minimal 
number of changes in movement direction of the object. Note that the arrows 
in Figure 3.2 are not the inputs delivered by the policy functions, but indicate the 
direction of movement of the visual representation of a manipulated object induced 
by a certain input of the policy. 



Participants’ eye movements were captured by a SMI iViewX'” FIED llSen03ll head 
mounted eye tracking system. For transforming gaze positions into screen coordi- 
nates a video-based marker detection and tracking system was used (see MB VK09ll 
for details). 



3.3 Procedure 

The participants had to execute 60 tasks of different types ( A,B,C,D ) in random 
order, while the order was equal for all participants. Previous to the first task. 
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(a) Policy tt\ for task D (b) Policy 7t2 for task D 

Figure 3.2: Two policies for task of type D with minimal number of changes in 
movement direction of the manipulated object. 



every user was told that only horizontal movements of the pen are interpreted by 
the system and the key of the keyboard has “some additional function”. Addi- 
tionally, users had time to practice interaction with the pen in a separate applica- 
tion, where horizontal movement of the pen changed the background color of the 
screen. Hence, no mental model could be build about the mapping between input 
and object movement during practice. 



3.4 Data Analysis 



In the study described here we focus on analyzing gaze movements, which occur 
previous to any object movement. Such pre-object gaze movements are particularly 
interesting for estimating user’s intention and model mismatches. 



Both, fixations and saccades are classified as described in |2,3| In order to obtain 
an estimate of user’s intention, information about all pre-object gaze movements 
is cumulated by 



n e = {? 



Jl ’ 



3 m 



} 



with 

jk £{*€ {1, ..., N n }\#P f ’ % + #P S ’ 1 = maximum}, 



where jjPf’ 1 and jjP s ’ 1 denote the number of pre-object proactive fixa- 
tions/saccades assigned to the respective policy nf . In our experiments we only 
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consider the two different policies shown in Figure [33| (=> N n = 2). If there are 
no proactive gaze movements, intention can not be estimated and Ir' = {}. If 
the number of fixations/saccades assigned to two or more distinct policies is equal 
(|IP| > 1), also no decision for one single policy is possible. 

The policy actually taken by the user is determined by aligning the sequence of 
inputs it 1 , ...,it Nj with the different policies. We calculate a score Sj for every 
policy 7 Ti of a given task according to 



N, 

Z < Pt *+ 1 

k= 1 



■Pt„>Pt;+i-Pt h > 



The policy jrf n (to = argmax sj with the highest score is considered to be the 

i 

policy, which was chosen by the user for executing the task. 

We assume a mismatch between the mental model of the user and the real system 
if the estimated user intention differs from the policy actually taken: 

model mismatch := (f[ e ^ {}) A (7 ^ fl e ) 



3.5 Results 



Figures 3.3 and |3.4| show target areas, object movements, and the last pre-object 
gaze positions in each task for two different participants. Data from earlier tasks 
of the experiments are drawn in light gray, later tasks in dark gray and black, 
respectively. In the left figures only data from tasks of type B and in the right 
figures only from tasks of type D are shown. For both of the two different tasks 
the policies used by the user converged towards a certain policy. However, gaze 
behavior differed significantly for the different users and for the tasks. 




are distributed along 7Ti, while the most frequently used policy is 7r2. Hence, the 
figure for task B indicates a model mismatch, since pre-object gaze movements 
indicate that a wrong system reaction was anticipated by the user. 



The user who’s data is shown in Figure 3.4 (“User 2”) almost exclusively used 



reactive gaze behavior at the beginning of the task sequence. For both tasks last 
pre-object gaze positions are concentrated around the initial object position. Ad- 
ditionally, object paths in Figure [3~4| show much more deviation from one of the 
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Figure 3.3: Object paths and last pre-object gaze positions for “ User 7” showing 



predominantly proactive gaze behavior for tasks of type B (a) and D (b) 



two “optimal paths” compared to those in Figure [33] This indicates that the user 
with proactive gaze behavior has built a better mental model than the user with 
reactive gaze behavior or, the other way round, the user with the better model uses 
proactive gaze behavior while the user with the worse model uses a reactive one. 



At the end of the task sequence for task D “User 2” switched from reactive to 
proactive gaze behavior (see Figure 3 .4 b) I. This also supports the proposition 
that a better mental model build over time leads to more proactive gaze behavior. 



In order to evaluate the proposed model more formally all pre-object fixations and 
saccades are classified as described in |2,3| and |3.4[ Further, user’s intention is 
estimated as the set of most probable policies fl 1 ' based on the gaze data and is 
compared with the true policy chosen by the user. Note that for calculating II s 
not only last pre-object fixation as shown in Figure 3.4 and 3.3 is considered, but 
all pre-object fixations and saccades are used as described in 3.4| The results of 



this evaluation for data collected from four participants are shown in Figure |3,5[ 
The data in Figure |Jl3|corresponds to User 1 and the data in Figure [374] to User 2. 



Pre-object gaze movements of User 1 and User 4 deliver good estimates of the 
policy actually chosen by the user for task D. The amount of tasks of type D with 
{} is 95.83% for User 1 and 66.67% for User 4. For task B this percentage 
is lower for User 1 (52.12%) and similar for User 4 (69.57%). However, especially 
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(a) Task of type B 



(b) Task of type D 



Figure 3.4: Object paths and last gaze position previous to first object movement 



for User 2 showing predominantly reactive gaze behavior for tasks of type B (a) 
and a switch from reactive to proactive behavior for tasks of type D. |(b)| 



User 4 shows many proactive fixations and saccades indicating a different policy 
than the one actually chosen (see Figure |Tf|[a)) . This indicates the existence of a 
model mismatch for task B. 



For User 2 the number of tasks with IT' {} increased with the time from 50% 
for task D (34.78% task B ) in the first half of the task sequence to 75% (60.87% 
task B) in the second half (complete task sequence: 62.5% task D, 47.83% task 
B). Flowever, as we can see in Figure 3,f|[b)| User 2 produced a lot of mismatches 
between gaze and actual object movement at the end of the task sequence. 



User 3 shows a similar development of gaze behavior as User 2 (at least for task 
B), however, with a lower amount of tasks with proactive gaze behavior allowing 
for estimating user’s intention (task B: 1st half 17.39%, 2nd half 60.87% total 
39.13%; task D: 1st half 33.33%, 2nd half 33.33%, total 33.33%). 



3.6 Discussion 

The above results show that the proposed model provides a good basis for analyz- 
ing gaze data in the context of a task and in a dynamic environment. The small 
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mismatch match 
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(a) Task of type B 




Figure 3.5: Comparison of estimated policy and true policy chosen by the users 
for all tasks of type D. Bars above the horizontal axis indicate a match of estimated 
intention and true policy, bars below indicate a mismatch. Missing bars indicate 
that no intention estimation was possible. 



study for evaluating our model already revealed some interesting findings which 
need to be verified in further, more extensive studies. 



Building the correct mental model for task B seems to be more difficult than for 
task D. All of the four participants showed more mismatches for task B. Also 
more proactive fixations and saccades could be observed for task D than for task 
B. The difficulties users had with task B could be explained by a wrong mental 
model about the functionality of the key, namely that the key determines whether 
the point moves upwards or downwards instead of along one of the two diagonal 
axes as shown in Figure 3.1 However, when asked after the experiments, all users 
explained the functionality of the key correctly. 



Gaze behavior strongly varyied among different users. However, also similarities 
between User 1 and User 4 as well as User 2 and User 3 could be observed. The 
differences between the two groups of users seem to be correlated with the quality 
and certainty of the user’s mental model. This dependency, if verified in further 
studies, could be used for automatically adapting the human-computer interface 
according to the quality of the user’s mental model. For users with a correct men- 
tal model proactive gaze behavior could be used for intention recognition and for 
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realizing proactive interaction techniques. Users with an uncertain or wrong men- 
tal model could be supported by adequate hints, which help to improve or correct 
their mental representation of the system. 



Based on the considerations from 2.2 we would expect the user to look at a certain 
location on the screen for one or both of the two reasons: “control of input” and/or 
“verification of system reactions or states”. The results of our study can be inter- 
preted in the following way: the more proactive gaze behavior of “experienced” 
users is caused by a switch from verification of system reactions for learning pur- 
poses to indirect control of absolute positioning of the object at key points of the 
task. 



4 Conclusion 

The proposed model allows for analysis of gaze data in the context of a task cur- 
rently executed by the user. Uncertainty of the mental model of the user can be 
modeled and therefore it also provides a basis for studying the influence of learn- 
ing processes on natural gaze behavior. We have demonstrated how it can be used 
to provide an explanation for “why we look where we look” and as a general 
framework for analyzing gaze data in context. 

The evaluation of the model in a small user study has been shown that the various 
kinds of visual control strategies of the user can be captured by the model and thus 
can be made available for further analysis. We have observed that gaze behavior 
is both, task- and user-specific. The results indicate, that a better mental model 
leads to more proactive gaze behavior, allowing for intention recognition when 
aligned with the task. This allows for designing new proactive user interfaces 
and improvement of uncertain input devices (see also IIBVK09II ). However, the 
study also revealed that not only a correct but also a wrong mental model leads 
to proactive gaze behavior. We have demonstrated that such a model mismatch 
can be detected with the proposed framework at an early stage, which allows for 
designing adaptive systems providing help to the user automatically. 

In future work the proposed model needs to be evaluated on larger data sets and 
with more complex tasks. In this paper we have only focused on pre-object gaze 
data. However, the model also can be used for analyzing gaze data during object 
movement phases which contain further valuable information about user’s cogni- 
tive processes. Finally, new user interfaces based on the proposed model need to 
be implemented and evaluated. In this context the proposed model also provides a 
good basis for real-time recognition of user’s intention and cognitive states. 
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Abstract: The essential key capabilities for a mobile robot are to determine 
where it is located and gather an idea of its surroundings. For precise self lo- 
calization several sensors are needed as due to noisy measurements no single 
sensor is sufficient. The data from the sensors is fused to a combined estimate 
resulting in a more accurate localization. As in most situations positioning 
sensors like odometry and GPS alone are still insufficient, for example in case 
of collision avoidance, it is preferable to incorporate exterozeptive sensors as 
well. Furthermore it is possible to use these sensors to localize the robot in 
a map. If a map of an area is unavailable, the robot has to build it while ex- 
ploring the environment. This exploration and mapping is a very challenging 
problem because of noise in the sensor measurements and approaches to solve 
this so called simultaneous localization and mapping (SLAM) problem exist. 



1 Introduction 

The AMROS (Autonomous Multisensoric Robots for Security Applications) sys- 
tem, currently developed at Fraunhofer Institute for Information and Data Pro- 
cessing (IITB), is an autonomous mobile robotic system for multi sensor outdoor 
surveillance of real estates and building complexes |EMF + 07|| . To perform au- 
tonomous surveillance and security inspection the robot must be able to patrol 
around a building or navigate to certain points of interest. 

The essential key capabilities for the mobile robot to be able to patrol au- 
tonomously is to determine where it is located and perform path planning and 
following. For precise self localization several sensors are combined by means of 
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multi-sensor fusion resulting in a more accurate localization. In addition to po- 
sitioning sensors exterozeptive sensors are incorporated as well, for example for 
collision avoidance. Furthermore, the exterozeptive sensors can be used to localize 
the robot in a map. These maps can be built from raw sensor data yielding a dense 
map or with an additional feature extraction step resulting in a map consisting of 
distinct landmarks. In the former case the dense map can be interpreted as a single 
landmark. 

1.1 Sensors for Localization 

To localize itself in the environment the mobile robot platform is equipped with 
several sensors. The easiest way to perform localization is dead-reckoning, i.e., 
to use the odometry sensors (wheel encoders) of the robot by incrementally in- 
corporating the measured revolutions of the robot’s wheels from a known starting 
position. As these encoders only deliver relative measurements and all sensors are 
subjected to errors, the uncertainty of the pose grows boundlessly over the cov- 
ered distance. In outdoor environments navigation sensors like GPS and compass 
can be used. They are measuring absolute quantities and therefore are not suffer- 
ing from error accumulation but are prone to be disturbed locally by surrounding 
objects. The measurements of the compass are degraded by disturbances of the 
terrestrial magnetic field, e.g., by metal fences or ventilation fans of air condition 
systems. Using a Differential GPS receiver the significant remaining source of er- 
ror is the multipath propagation due to reflections and shadowing effects of large 
objects like buildings. As the multipaths are dependent on the constellation of the 
receiver and the satellites relative to nearby reflecting surfaces the errors are time 
variant and locally varying HEFK08I . 

In addition, sensors which observe the environment like a laser scanner or camera 
can be used for localization in a map. For navigation, a map is also advantageous as 
it provides the possibility of path planning beyond the actual sensor coverage. The 
required map can be provided by an official site plan of the real estate. However 
the main disadvantage with this approach is that it is not granted that the sensors 
detect the same objects as specified by the given map. Hence, in most cases, it 
is better to create a map from the sensor data the robot gathers while exploring 
the environment for the first time. This is a very challenging problem because of 
uncertainties in the sensor measurements as can be seen in Figure |TTT| Because all 
sensor measurements contain errors and the estimate of the robot’s pose depends 
on the map and vice versa, the resulting map may become inconsistent llDWB06al . 

The map was built as occupancy grid map by naive recording of the sensor data, 
i.e., the robot’s odometry and a 180° 2D laser scanner (SICK FMS 200). It repre- 
sents the world discretely as a 2D matrix. Its cells are considered independent and 
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Figure 1.1: Naive recording of the sensor data and the robot’s path, estimated by 
odometry only. 



contain the probability of being occupied. A simple algorithm based on a frequen- 
tist approach is used for updating the cells: Each cell c’ c,v of the map contains two 
values, one to count how often the cell has been inspected ( i x,v ) and the other one 
counts how often the cell has been found occupied ( o x,y ). Each laser beam can be 
described by a line from the sensor’s origin to the endpoint. The cell containing 
the endpoint is deemed occupied, thus i x,v and o x,y are both incremented. For all 
other cells being crossed by the line, only i x,y is incremented. The probability of 
the cell being occupied is 



■x,y 



n x ,y — 



= p{cl' V )=^y 

°k 



(l.D 



with k indicating the time. In Figure 0 dark areas show a low probability of 
occupation, while the bright ones are occupied with high probability and gray 
indicates no information. It can clearly be seen that the noise of the sensor data 
results in an inconsistent map as the robot’s return path shows a second corridor 
where there is only one. 



2 Probabilistic Simultaneous Localization and Map- 
ping 

To build a precise and correct map, the robot has to simultaneously localize itself in 
the so far registered map which contains errors and has to update it continuously. 
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Figure 2.1: The SLAM problem as a Dynamic Bayes Network. 



Consequently, the map built becomes inconsistent unless the dependencies be- 
tween the uncertainty in the pose and the errors in the map are taken into account. 
By observing areas or features of the map several times, the uncertainties in the 
map are decreased and the map converges to a better solution. Several approaches 
of probabilistic mapping exist to solve this so called simultaneous localization and 
mapping (SLAM) problem llDWB06al . The SLAM problem and its dependen- 
cies between the state variables can be modeled as random variables in a Dynamic 
Bayes Network (DBN), see Figure 2.1 IITBF05I . The path x of the robot is de- 



picted in green, while k denotes the time steps. The position sensor measurements 
are denoted as control inputs u and are shown in yellow. The measurements z of 
the sensors observing the environment, so called exterozeptive sensors, are shown 
in orange. They measure features 9 of the environment. These features are also 
called landmarks. The data association n is a very important auxiliary requirement 
as it represents the correct associations between observations and landmarks in the 
map. A SLAM algorithm has to estimate the full SLAM posterior 



p(x k ,Q\z k ,u k ,n k ) 



where © = 0 N = [0\, 02, 0jv] is the map consisting of all N landmarks 9. A 
variable with superscripted time like x k denotes the set of all its instances up to 
time k. 

There are several criteria for an algorithm solving the SLAM problem to converge 
to a better solution when re-observing known parts of the map: The first criterion 
is the incorporation of the dependencies between the pose of the mobile robot and 
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the map or landmarks respectively and between landmarks themselves. Secondly, a 
robust data association is important as wrong associations can lead to catastrophic 
failures. Thirdly, in dynamic environments the detection of moving objects has to 
be considered as they have to be eliminated from sensor data prior to mapping. If 
data from dynamic objects is used for landmarks the map will most likely contain 
spurious information or will even be completely incorrect. 

Solving the DBN problem directly is computationally demanding as the effort 
grows exponentially with time k. But the problem can be also seen as Markov 
chain, cf. Figure [TT] Given the Markov assumption that the present state is fully 
described by the previous state and the current measurements, the problem can be 
computed recursively and the map is built incrementally. This recursive update 
scheme is also known as Bayes filter. In most cases the recursive update is not 
solvable in closed form, meaning approximations are inevitable. By restricting the 
SLAM posterior, the motion model, and the measurement model to multivariate 
Gaussian distributions the well known Extended Kalman Filter (EKF) can be used 
to estimate the full SLAM posterior llDNC + 0ll . Neglecting the state of the robot, 
its memory requirement is quadratic in the number of landmarks while the com- 
putational complexity is even greater than 0(N 2 ) as the covariance matrix has 
to be inverted. Due to the high dimensionality of the problem when dealing with 
maps containing a lot of landmarks the EKF can become computationally infea- 
sible. Furthermore only unimodal distributions can be modeled and only a single 
hypothesis in data association can be maintained. This renders the algorithm sus- 
ceptible to ambiguous situations and incorrectly incorporated observations cannot 
be removed. If many wrong associations occur, the algorithm will diverge. To 
reduce the impact of wrong data associations in target tracking applications the 
Multi Hypothesis Tracking (MHT) approach has been introduced !Rei791 . In sit- 
uations where several association hypothesis are probable, new EKFs are instanti- 
ated according to each hypothesis, multiplying the computational effort. To keep 
the number of filters from increasing steadily, heuristics are needed to remove 
improbable hypotheses over time. 

In contrast to the Kalman filter approaches being parametric approximations an 
alternative method is discretization of the probability distribution. The complete 
discretization of the state space would lead to either a very coarse representation 
or incredibly high demand for computational power and memory. A more efficient 
discretization strategy can be achieved with particle filters. Particle filters repre- 
sent distributions with a finite set of samples, whereas the density of the samples 
is proportional to the probability of the state. They have the capability to model 
multi-modal distributions and implicitly incorporate multiple hypotheses over data 
associations. On the other hand standard particle filters are only suitable for low 
dimensional problems as in the worst case the required number of particles grows 
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exponentially with the dimension of the state space. The dimensionality of the 
SLAM problem also grows with the number of landmarks and the particle filter 
may become inapplicable. However, it is possible to condition the landmark esti- 
mates on the robot’s path allowing the SLAM problem to be factored into indepen- 
dent landmark estimation problems, which is an instance of the Rao-Blackwellized 
particle filter MDdLMROOI . 



3 Rao-Blackwellized Particle Filter SLAM 

3.1 Factorization of the SLAM Posterior 



An implementation of a Rao-Blackwellized particle filter algorithm in the con- 
text of SLAM known as LastSLAM, was presented by Thrun et al. IMTDW02I . 
Regarding the Dynamic Bayes Network in Ligure 0 it becomes evident that if 
the robot’s path is known, the landmarks become conditionally independent from 
each other. Applying this condition to the particle filter, each particle represents 
a hypothesis of the true path of the robot, meaning the landmarks can be esti- 
mated independently, i.e., the SLAM posterior can be factored in the following 
way IITBL051 : v 

p(x k , 0| z k , u k , n k ) = p(x k \z k , u k , n k ) p(9 n \x k , z k ,u k , n k ) . 

n = 1 

The particles sample the distribution of the path posterior 

p(x k \z k ,u k ,n k ) . (3.1) 



Each particle has N independent landmark estimators attached to it, meaning 
every particle is carrying its own map. If the errors of the observations of the 
landmarks are modeled as Gaussians these estimators are EKLs. The cross- 
correlations between landmarks do not have to be maintained explicitly like in 
the full EKL-SLAM, but are merely implicitly incorporated by the condition of 
known paths. 



It is not possible to directly sample from the target distribution, i.e., the path poste- 
rior ( |3. 1 1 >. Therefore the M samples are drawn from the probabilistic motion model 
p{xk\uk,x [ ^} l ) for each particle [m] separately. The propagation of the particle 
with the motion model in combination with the assumption that the particles from 
the previous time step are distributed according to p(x k ^ 1 ^ m ^ \ z k ~ l , u k ~ l , n k ~ l ) 
yields the proposal distribution 



p(x k ^\ Z k -\u k ,n k - 1 ^)=p(x [ ^ ] \u k ,x [ ^ 1 )p(x k - 1 ^\z k -\u k -\n k - 1 ) . 
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In addition, each sample is given an importance weight which is the ratio of the 
target distribution and the proposal distribution. This ratio is proportional to the 
observation likelihood. Thus the set of weighted particles represents the probabil- 
ity distribution of the full SLAM posterior. As the landmark estimators are EKFs, 
the importance weights can be calculated by means of innovation, which is the 
difference between the current observation z k and the predicted observation z [ m ] 

n k 

based on the map 



target distribution 



p(x k ’W\z k ,u k ,n k ’W) 



proposal distribution p(x k ’l m \\z k 1 ,u k ,n k LM) 
otp(z k \x k ^ m \z k - l ,u k ,n k ^) 



(3.2) 



1 



|2ttZ [m] 



exp ( -Uz k - z Jm] k ) T Zj m] J z k - z n[m] k ) 



n. ,k ' 



n k 






Z n \m] k being the innovation covariance matrix 



Zj m] k = HPW H t + R (3.3) 

n k n k ,/c — 1 

with the linearized measurement model H, the covariance of the observed land- 
mark P [m] and the measurement noise R I1MT071 . The observed landmarks are 

n k 

updated separately with an EKF. If several landmarks are observed at once, they 
can be computed sequentially as the are conditionally independent. 

If this recursive algorithm is processed for a long time it may suffer from degener- 
ation of the particle weights, i.e., most of the w k become very small. To avoid the 
degeneration an algorithm called Sequential Importance Resampling (SIR) was in- 
troduced by Rubin HRub881 . Therein, a new set of unweighted particles is drawn 
from the present set by sampling with replacement with probability proportional 
to w k . Afterwards the importance weights are reset to w k = 1/M, which means 
the set now consists of unweighted samples. 



3.2 Consequences of Resampling 

The necessary resampling has some drawbacks on the performance and conver- 
gence speed of the algorithm. In the resampling step particles are thrown away 
while others are duplicated possibly several times. This so called sample impov- 
erishment or particle depletion reduces the sample diversity and thus the perfor- 
mance of the algorithm degrades. As the dependencies between the landmarks are 
not used explicitly but merely bound to the condition of a known path, this leads to 
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an underestimation of the landmarks covariances and a slower convergence speed 
of the algorithm or even a complete failure. 



There are several reasons for sample impoverishment. If the robot is equipped with 
noisy position sensors and accurate measurements of landmarks, the proposal and 
target distribution do not match very well in the sense that many particles will get 
very low importance weights and will be thrown away in the next resampling step 
with high probability. A sensor configuration like this is found very often in mobile 
robotics as most odometry sensors are not very accurate. An extension to the pre- 
sented algorithm, the FastSLAM 2.0 algorithm, dealing with this problem has been 
introduced by Montemerlo et al. |MT07|| . It addresses the dissimilarity of proposal 
and target distribution by introducing a new proposal distribution incorporating the 



present sensor measurement. This will be discussed in Section 3.3 



A situation where a lot of samples are thrown away is the closure of large loops. 
Due to error accumulation of relative position sensors like odometry the uncer- 
tainty in the robot’s path grows while traveling through unknown areas. While 
closing a loop the robot enters known territory and the uncertainty in pose be- 
comes very low. Thus a lot of particles are deleted from the set. Another aspect 
regarding the growth of uncertainty in the context of closing large loops is the re- 
quired amount of particles to ensure their diversity. The larger the loop the more 
samples are required. To address this issue, the fusion of relative measuring sen- 
sors with absolute measuring sensors like GPS and compass to confine the global 
error has been investigated in I1EFK08I . 



Another reason are ambiguities in data association when dealing with densely 
spaced point landmarks or in situations where the uncertainty in pose is very high. 
The latter is especially prominent when closing loops while only using relative 
sensors as the uncertainty in the pose steadily increases while driving in previ- 
ously unknown territory. The problem originates from modeling the landmarks 
as points. It can be very difficult to distinguish landmarks in the aforementioned 
situations as the only discrimination possible is by the positions of the landmarks. 
It is apparent, that an additional appearance-based attribute could improve the ro- 
bustness of the data association. The idea of an extended data association will be 
presented in Section]?] 



3.3 Gridmapping with FastSLAM 2.0 

Particle filters yield good results when the proposal and posterior distributions are 
similar. Thus sampling from the motion model can be disadvantageous when the 
robot’s motion is very noisy compared to the ambient sensors. In this case many 
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particles are discarded in the resampling step, degrading the performance of the 
filter. 

To obtain a proposal distribution that better matches the target distribution the 
most recent sensor observations are incorporated into the proposal distribution. 
The particles x|, m are thus drawn proportional to p{xk\x k ~ 1 ^ m \ z k ,u k ,n k ^ m ^). 
The new sampling distribution is obtained in an EKF estimation step per particle. 
The prior distribution for the EKF is provided by the linear Gaussian propagation 
of the particle of the previous time step. The likelihood of the measurement Zk 
is fused in the standard EKF update step and a new particle is drawn from the 
resulting Gaussian. 

The path posterior of the previous time step is assumed to be distributed accord- 
ing to n fe_1 ) as before. Together with the new sampling 

distribution the proposal distribution is now given by the product 




z k ,u k ,n k ' I 






t] | fc _i 



jfc-i.M) _ 



The importance weights have to be adapted accordingly, since the proposal 
distribution has been changed: 




target distribution 
proposal distribution 



p{x k ^\z k ,u k ,n k ’H) 

p(x |_ m ^ \x k ~ 1 '\- rn \ z k , u k , )p(x fc “ 1 'H | z k ~ l , n k ~ M m l) 

oc p{z k \x k -^ m \z k - 1 ,u k ,n k ^) . 



As several concurrently observed landmarks are incorporated one after another, the 
data association is done sequentially also. Thus the association of the first land- 
mark will still be difficult as at this point only knowledge from the motion model 
is available which is very noisy, i.e., in FastSLAM 2.0 the order of incorporating 
several landmarks has to be considered IIMT07I . 

An instance of FastSLAM 2.0 was implemented with a laser scanner and occu- 
pancy gridmaps similar to the algorithm found in I1GSB071 . To compute the ob- 
servation likelihood a scan matching step is performed per particle to find the best 
match between the present sensor measurement and the map. The match is evalu- 
ated by computing the correlation between a local map built from the present laser 
scan and the global map according to 

E x y (p X ’ y ’ gl ° bal - P) ■ (p^’ l0Ca '(x fc ) - p) 

P 7 > 

„ (^’ y ’ gl0bal - P ) 2 E X y (p a: ’ y ’ lc>cal (xfc) - p ) 2 



(3.4) 
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Figure 3.1: Map of the best particle and the robot’s path in red. 



with 






being the average map value IITBF05II . p x,y is the probability of the cell being 
occupied, cf. Equation CD- Whereas global indicates the map and local indicates 
a local section based on the actual laser scan. L is the number of cells being 
covered by both the local and global map. 



To find the best match, a gradient descent search is performed. Afterwards, a 
local Gaussian approximation is computed around the found maximum for the 
integration in the proposal for the EKF. The gridmaps are updated with the scheme 
explained in Section [1] assuming the cells being independent and the importance 
weights are computed according to equation (|3.4|). 



Resampling is necessary to counteract degeneration of the importance weights, but 
it can intensify the particle depletion mentioned earlier. Thus resampling should 
only be carried out when the particles do not approximate the target distribution 
well. This is the case when the variance in the importance weights grows, as di- 
rectly sampling from the target distribution leads to equal weights for all particles. 
A measure for the quality of the approximation can be estimated by the effective 
sample size 



M, 



eff 



£m = l 04 m] )" 



as formulated by Doucet et al. IDGA00I . Resampling is performed only if M e ff is 
below a predefined threshold. 
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A: Robot, 0: Landmark, CD: Measurement uncertainty, CD : Pose uncertainty. 
Figure 4.1: Measurement ambiguity on the left and motion ambiguity on the right. 

Figure M shows that the algorithm works very well. The sensor data is the 
same used to build the map in Figure o The SLAM algorithm is capable of 
re-localizing the robot in the map and thus generating a consistent map. An ad- 
vantage of this algorithm is that it also considers the free-space. Dense maps also 
have a great advantage for path planning as they also contain information of free- 
space while feature-based maps are normally sparse. Feature-based maps contain 
only landmarks matching a given model extracted from raw sensor data discarding 
information about the space in-between the landmarks. On the other hand, cal- 
culating the correlation ( |3.4| ) is computationally highly demanding. Furthermore, 
for calculating the correlation the normalized quadratic distance between maps 
is compared, which has no physical justification as it does not model the noise 
characteristic of laser scanners. 





4 Data Association 

A reliable data association between observations and the contents of the map is the 
second convergence criterion for SLAM algorithms, cf. Section [2] There are two 
factors affecting the reliability, namely motion noise and measurement noise both 
leading to ambiguities as depicted in Figure |4,1| While measurement noise may 
lead to single wrong data associations, motion noise may lead to several erroneous 
associations at once. In addition, motion noise is often stronger as pointed out 
above. 



4.1 Robust Data Association 

In case of feature-based maps and a Gaussian error model of landmark attributes 
the probability of an observation can be computed by a function of the innovation: 
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with Z [m] k according to equation ( |3.3[ >. This is an instance of a Maximum Like- 
lihood (ML) estimator for multivariate Gaussians. If the likelihood of matching 
the observed landmarks to any landmark in the map is below a certain threshold 
a new one is placed in the map. If this threshold is too high, many landmarks 
will be erroneously instantiated several times despite already existing in the map. 
Being too low, newly observed landmarks will be associated wrongly to existing 
landmarks in the map. 

A particle filter implicitly represents multiple hypotheses over data associations 
as the association is done per particle. Particles with wrong data associations will 
obtain low importance weights and thus are more likely to be thrown away in 
a resampling step at a later time. This can be considered as a delayed decision 
making in statistically justified manner with no need for heuristics, as it is the case 
with MHT. Although this is a great advantage, it might have an impact on the 
sample impoverishment when too many wrong data associations occur and thus 
indirectly affects the first convergence criterion. 

Especially in ambiguous situations an improvement can be achieved with joint 
consideration of multiple data associations per time step, as it is done for exam- 
ple in the Combined Constraint Data Association (CCDA) algorithm of Bailey 
lBai02l . Another possibility is to exploit the multi hypotheses property of the 
particle filter further with Monte Carlo data association. The key idea of Monte 
Carlo data association is to perform the associations probabilistically with proba- 
bilities proportional to their likelihood. In this way it accounts for ambiguous situ- 
ations where several association hypotheses are probable but needs more particles 
to achieve the same accuracy IMT07II . 

A further addition is the usage of negative information to remove erroneously in- 
stantiated or outdated landmarks from the map, which is beneficial in case of dy- 
namic objects, erroneous measurements or changes in the environments. Lor this 
purpose an additional value for the probability of its existence is attached to every 
landmark. Positive evidence is found when reobserving a landmark and negative 
evidence when a landmark of the map is in sensor coverage but is not observed. 
If this probability drops below a certain threshold the according landmark will 
be deleted from the map. The existence probability can also be used for delayed 
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instantiation of landmarks by introducing another threshold which has to be ex- 
ceeded. This is especially useful for FastSLAM 2.0 as the sequential data associ- 
ation is more robust when using landmarks with low uncertainties, i.e., landmarks 
which have been observed repeatedly IMT07I . 



4.2 Extended Data Association 

In the original FastSLAM algorithms 2D point features possessing two coordinates 
as attributes are used as landmarks, which renders the data association susceptible 
to ambiguities in areas where landmarks are dense and especially in situations 
when closing a loop. The reason for the latter is the growth of uncertainty in 
the robot’s path and map while the robot travels in unknown environments. It 
seems apparent that an additional visual signature-based or appearance-based at- 
tribute could enhance the distinction between landmarks llDWB06bl . Especially 
in situations where landmarks are densely spaced or the pose uncertainty of the 
robot is very high, the appearance of the landmarks could greatly improve the data 
association. 

Landmarks are features of the environment and can be modeled as objects with 
several attributes, i.e., an object-oriented representation of the the landmarks. At- 
tributes with Gaussian error distributions can be incorporated directly in the ML 
estimator (equation ( |4. 1 1 >) and the importance weights ( |3.2| . If the attributes are 
non-Gaussian but independent of the position, the ML estimator is able to incor- 
porate the measurements of position z pos and appearance z app of the landmark 
independently: 



The likelihood is computable if a measure for the difference between the observa- 
tion z a pp and the expected observation £ app can be computed. 

Simulations have been performed to compare data association robustness and their 
impact on overall performance of the SLAM algorithm between point landmarks 
and landmarks with an additional appearance-based attribute. The implementation 
is based on MATLAB code by Tim Bailey, which was extended by a ML data 
association estimator according to € 3 - All motions of the robot and sensor mea- 
surements are affected by additive Gaussian noise. For comparison simulations 




= are: max 

ni ml 
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* : True landmarks, * : Estimated landmarks, Robot’s path. 
Figure 4.2: Simulation with position attributes only. 




x/m 



* : True landmarks, * : Estimated landmarks, Robot’s path. 
Figure 4.3: Simulation with additional appearance-based attribute. 



with position attributes only and with an additional appearance attribute respec- 
tively were performed. The corresponding results are shown in Figure [4~2] and [43] 
showing the estimated maps of all particles as red dots. 

The simulations were conducted with 200 particles and the same parameters. The 
robot travels twice along the dark blue path. While the real landmarks are depicted 
in green their estimates are shown in red. As every particle posses its own map 200 
maps exist in the particle filter. The following results are summarized for all maps. 
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In case of the position-only attributes 1 1303 wrong data associations occurred and 
4200 times an already mapped landmark was erroneously instantiated. The robot 
could not successfully close the loop in the second round leading to divergence 
and failure of the mapping process. 



With the additional appearance attribute all data associations were correct but it 
still happened 430 times that an already mapped landmark was erroneously in- 
stantiated, leading to three landmarks being mapped twice. Nonetheless the robot 
could successfully close the loop and build a consistent map except for the three 
duplicated landmarks. Further improvement could be achieved by including joint 
considerations of data association hypotheses, Monte Carlo data association and 



inclusion of negative information as mentioned in Section 4. 



5 Conclusion & Outlook 

An introduction to the SLAM problem has been given and methods of probabilistic 
SLAM based on Bayesian filters have been discussed. The FastSLAM algorithm 
with its properties making it very flexible and powerful was explained in detail 
regarding its advantages and shortcomings. Three approaches to improve the lim- 
itations of the algorithms have been illustrated. The advantage of an improved 
proposal distribution has been explained. Besides the incorporation of absolute 
sensors to confine the global error, the idea of an extended data association has 
been presented. The addition of an appearance-based attribute allows for a more 
robust data association. In consequence the particle depletion caused by wrong as- 
sociations can be alleviated, wrongly instantiated landmarks occur less frequently 
and the loop closure becomes more robust. 

In future investigations the simulation results will be experimentally verified. As 
additional sensor for the appearance-based attribute, a stereo camera in combina- 
tion with the laser scanner is considered. From the sensor data, landmarks are to be 
extracted with attributes for position and an additional appearance-based attribute 
which is independent from position. The appearance attribute should be indepen- 
dent from viewing angle and statistically assessable for the data association and 
computation of the importance weights. To avert the restriction of independence 
from the angle of view and position an auxiliary algorithm to cope with partial 
observability including the extension, splitting and merging of landmarks has to 
be considered later on. 

The object-oriented modeling of the landmarks allows for straight forward integra- 
tion of different heterogeneous sensors. As a minimum requirement the position 
must be extractable from sensor data in order to place the landmarks in a map. 
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Thus the SLAM algorithm has the capability of fusing information from differ- 
ent sensors into a single representation, for example the combination of sparse 
feature-based (landmark -based) maps with dense gridmaps. 
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Abstract: Surveillance systems have become increasingly powerful. Con- 
ventional camera-based systems are extended with all kind of sensors, the 
number of data sources increases, hardware and algorithms improve, and data 
can potentially be shared between interlinked networks. Smart surveillance 
systems take advantage of these developments and do not threaten solely the 
protection of privacy; they also provide an opportunity to achieve data and 
privacy protection on a new level. The current legal situation for ‘obsolete’ 
surveillance deployments has not been explored and is quite heterogeneous. 
Hence the Fair Information Principles (FIP) are still the minimum privacy 
requirements for surveillance systems. 

This contribution identifies the key challenges for security and privacy in 
smart surveillance that must be mastered in any future-proof system. Subse- 
quently two potential solutions are presented that solve two of them. Privacy 
issues are addressed by the Privacy Manager (PM), a framework for privacy 
enforcement in smart surveillance architectures that achieves compliance with 
the FIP. Authenticity of surveillance sensors is a security challenge, and a Web 
of Trust for smart Surveillance Sensors is proposed that can be established 
between surveillance operators. 



1 Introduction 

Many factors, such as decreasing prices, increasing capabilities, and the ‘war on 
terror’ lead to a growing number of surveillance installations. The major cause for 
deployment is crime prevention and most systems are still video based. Legacy 
systems are typically closed-circuit, have few (analog) cameras and do not assist 
the user. Modern installations are IP -based, can integrate a high number of cameras 
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and assist the operator. Systems become not only more powerful and frightening 
at the same time, but also more vulnerable. To assure success of surveillance 
solutions security and privacy objectives must be achieved. 



Surveillance is popular in the United Kingdom, and the current number of cam- 
eras can only be ‘guesstimated’, they differ between one million and 4.2 millions. 
By comparison, Germany has about 30,000 cameras in public places and 400,000 
in industrial environments. In some places, e. g., the central station in Frankfurt 
(150 cameras) or Leipzig (120 cameras), the concentration of surveillance sensors 
is comparable to the UK. Even if these impressive number^jare estimated, an ex- 
treme tendency towards surveillance is undeniable and it must be accepted that 
the cameras will not disappear. Conventional surveillance systems cannot handle 
the mass of information gained by the increasing number of sensors and smart 
surveillance and become a popular area of research. Several approaches have been 
presented , e. g„ lHBC+05t llBEE+08t ). 



The paper is organized as follows. After a short motivation for addressing pri- 
vacy and security issues, the author shows, why a holistic approach is required to 
achieve security and privacy in modern surveillance systems. Afterwards the iden- 
tified key challenges for privacy and security are highlighted and two potential 
solutions are presented. Concluding the solutions are discussed and future work is 
proposed. 



2 Motivation 

Conventional surveillance systems consist of n cameras, a video recording server 
and an user interface that can either display one ore more of the n cameras or can 
be used to search the stored video information. Most of these systems still con- 
tain many analog cameras and do not contain other sensors. Such installations can 
be secured by physical isolation and access controls. Furthermore manipulation 
of the collected data is difficult, costintensive and hence not worthwhile in most 
cases. To reduce deployment costs modern installations are integrated into exist- 
ing IT-infrastructures, which allows easy usage and data exchange. By contrast, 
a surveillance system is opened for attacks on security and malicious operators 
can relate personal data to other data sources. Due to the open nature of modern 
surveillance deployments, they can can easily be extended with new sensors and 
can be interconnected to huge surveillance networks that are used by different par- 
ties. In the UK surveillance systems are run together by the municipality, private 
security providers and the police. The state has lost track of existing deployments. 

1 References and more details can be found in IVB09I 
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These modern systems that get smarter inch by inch threaten privacy and data pro- 
tection. To prevent ‘total surveillance’ and misuse of personal data new solutions 
for privacy and security are required. These solutions must be compliant with law 
and must be accepted by Society, hence an interdisciplinary (holistic) approach 
for surveillance is required. Identified key challenges in security and privacy are 
highlighted in Section [5] All of challenges must be mastered to develop future- 
proof surveillance solutions that can be used in practice. Technological develop- 
ments, as the approaches presented in this work, must be discussed with lawyers, 
sociologists, and system operators. 



3 Related Work 

Privacy in surveillance is a recent area of research and new solutions are required. 
In the branch of video surveillance some approaches exist to ensure privacy and 
security, most of the approaches blur regions of interest (Rol) that might imperil 
privacy. In llSPH + 05l Senior et. al. purpose a “privacy-preserving console” for 
video surveillance. The console rerenders the video stream and hides sensitive de- 
tails, detected by video analysis. Depending on the authorization level, access is 
granted to rerendered videos (e. g. with blurred faces or even enriched with addi- 
tional information) or the raw video stream. They also purpose a “privacy cam”, 
which processes the video sources and transmits encrypted information streams. 
In 1C B 07 1 Chattopadhyay and Bould also present a privacy cam, which is im- 
plemented on a Blackfin DSP and blurs Rol based on PICO llBou05l . Another 
scrambling approach is presented in IDE06H . Fleck’s approach to privacy IIFS081 
is based on smart cameras, which transmit events instead of video data. Fidaleo 
et. al. present in I1FNT04II a privacy enhanced software architecture with a cen- 
tralized server that hosts a privacy buffer, which can remove private or identifiable 
information from the stream. 

Approaches for security also exist, and just as in the field of privacy research the 
approaches focus on video surveillance systems. One way to provide confidential- 
ity, integrity and authenticity (CIA, e. g. IDNVHC05I ) is to use video independent 
solutions that have been proved to be successful, as symmetric and asymmetric en- 
cryption, signatures, certificates and public key infrastructures (PKI), and existing 
security protocols (SSL, IPSec, Kerberos, etc.). However, in case of video data 
more specific approaches have been proposed that take advantage of video char- 
acteristics. To ensure authenticity of images and video, a lot of research has been 
done in the area of (robust) watermarking, e.g., lDF02t HTTP 111 . To achieve con- 
fidentiality of transmitted video data several approaches exist that achieve better 
performance by utilizing video characteristics, e. g., IICP091 . 
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4 Holistic Approach 



To develop efficient (adequate for several specific surveillance tasks) and accepted 
surveillance technologies, a holistic approach for privacy and security is required 
that considers legal and social aspects. Generally, technology improves at first and 
subsequently the real potential emerges, alarms society and is then limited by law. 
On the one side, relevant technologies improve with great speed and the gap to 
areas as social networks or ubiquitous applications is closing. On the other side 
perception and assessment of privacy change (extensive use of Twitter, Facebook, 
etc.), and the question is, whether data protection will exist in future and if so, 
a change towards enhanced data preservation is probable. Beside law and tech- 
nology, social acceptance is essential for surveillance technology. Surveillance 
must be more transparent to archive more confidence and must redress the balance 
of the asymmetric relationship of vision between data controllers and data sub- 
jects IIHT04II . Observed subjects need easy ways to interact with the system (or 
control it). However, a holistic approach is required that considers law and social 
acceptance from the beginning. To achieve a future-proof solution a control loop 
must be established between the three disciplines. It must be assumed that the 
potential of surveillance technology is exploited, even if it is prohibited by law. 
Hence prohibition is insufficient, security and privacy compliant solutions must be 
available, affordable and easy to use, so that no appeal exists for purchasing and 
abusing an overpowered surveillance system. It is important to consider social and 
legal changes that might happen or would do well in future and surveillance sys- 
tems must follow realistic requirements. Technological enhancement also provides 
new possibilities to achieve social acceptance and even better privacy I1VB09II . 



5 Key Challenges in Privacy and Security 



This work highlights the security and privacy challenges that must be mastered 
in modern surveillance systems; seven key challenges exist, four for security and 
three for privacy. Exact requirements for future surveillance systems are uncertain. 
Hence a holistic approach must be followed that covers tendencies and surveillance 
systems. The approach must be adaptable to specific surveillance scenarios, i.e., 
privacy and security guidelines and must be adaptable to the circumstances and the 
surveillance task. Following the identified seven key challenges are described. 
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5.1 Security 

As modern surveillance systems are distributed and IP-based, they face the same 
security issues as distributed systems ( Secure architecture ). Additionally other 
issues arise that result from the flexibility and size of modern surveillance systems 
{Flexible Architecture and Trust), the interconnection of surveillance systems and 
sharing of information {Access Controls and Information flow), and the trust of the 
users and validity in court, respectively {Certification). 

Secure Architecture: It must be guaranteed that surveillance data cannot be stolen 
or changed and it must be infeasible to prevent storing (authenticity, integrity, 
availability). Any access to the storage must be logged, and access to data must 
be granted according to the proper authorization level. To provide security for a 
single component is manageable, but to provide security for an entire surveillance 
deployment is a difficult task, which becomes even more complex in huge or flex- 
ible systems (changing users, tasks or sensors). To validate security properties 
and to build trust, deployments can be built by using an open architecture or by 
certification of the entire system. 

Certification: Up to now, no appropriate certification for modern surveillance 
systems exist. There is no international standardization for certification of such 
systems. It is estimated that manipulation is very challenging and can also be 
identified by experts. Even if it is hard to gain access to the system, manipula- 
tion becomes easier and the first law case about the authenticity of video data is 
only a matter of time. An affordable international certification for entire systems 
is required. Components from different distributors must be interoperable and sys- 
tem operators must be assisted during deployment (selection of protocols, sensors, 
etc.). 

Access Controls and Information Flow: Access controls must prevent unautho- 
rized access to any information provided by a surveillance system that includes 
access to raw data and meta data. Any access by a user or task must be granted ac- 
cording to least privilege, i. e. only data concerning a specific surveillance task or 
even subtask is accessible and access should only be provided as long as necessary. 
Dynamic allocation of authorizations is challenging in huge, flexible networks or 
if data is exchanged between surveillance systems. It must be ensured that data 
from different tasks cannot be combined and any forbidden information flow must 
be prevented. Beside access controls for data access, access controls for injection 
of sensor data must also be established. 

Flexible Architecture and Trust: Trust in a flexible infrastructure, sensors, tasks 
and other components can not easily be achieved. Definition and evaluation of trust 
models is difficult and no sufficient model for a (multi-party) surveillance scenario 
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exists. It is also an open question how existing systems must be extended to pro- 
vide trust. Prior unknown partners, spontaneous interconnection and a changing 
number of sensors make it difficult to validate integrity and authenticity of the sys- 
tem state or specific sensors (insertion, removal). Using wireless sensors makes 
integration of sensors easier, but it must still be ensured that detected events and 
control data is reliably transmitted. 



5,2 Privacy 

Privacy is a philosophical term and everybody has its own definition of it. In terms 
of surveillance, the FIP provide minimum requirements and national law must be 
considered as well. However, the sense of privacy changes with the surveillance 
context (task) and over time. Hence systems must be adaptable to changing privacy 
requirements. The Privacy challenges in surveillance are: data protection, trust in 
surveillance and privacy-aware data exchange and communication. 

Data Protection: Any surveillance task must be specified exactly, and only data 
concerning this task must be generated and collected by the surveillance system. 
Hence any data must refer to its surveillance task(s). For applicable solutions ac- 
cess controls must be quickly adaptable (granting and prohibition) to new surveil- 
lance tasks. The FIP require data minimization. In modern surveillance systems 
this can be divided in minimization of data collection, data processing and data 
storing. To minimize data collection, only required (as few as possible) sensors 
must be used and irrelevant data must be deleted instantly (at sensor level). This 
prevents area-wide surveillance and the creation of movement profiles. To en- 
sure privacy as few as possible of the collected data and prior knowledge must 
be processed (semantic level). If, for instance, an identified object is processed, 
only relevant attributes must be accessible. Again, non-useful data must be deleted 
instantly. In the end surveillance data is stored, which is also done in a data mini- 
mizing way. Only relevant events and only sensor data must be stored. The sensor 
data must be stored privacy-compliant, i. e., according to authorization levels, only 
a subset of the stored data is accessible. Access to stored data must be as granu- 
lar as possible. Additionally, to be compliant with law, any observed subject can 
request information about the personal data related to him and can dispose correc- 
tion or erasing. Deployments that are not aware of privacy will not be compliant 
with law and will not be accepted by the users (society). Modern systems have 
the potential to ensure privacy on an unrivaled level. Hence one component of 
the NEST (Network Enabled Surveillance and Tracking, llBEE + 08l ) architecture 
is the Privacy Manager, which is described below. A more detailed description 
about privacy in surveillance and the Privacy Manager can be found in I1VBEB09I . 
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Data Exchange and Communication: As mentioned above, data can potentially 
be used in another surveillance task, or for the same task in another surveillance 
system. To guarantee privacy, person related data must be secured against unau- 
thorized usage. Lifetime and usage in surveillance tasks must be restricted. Hence 
a form of digital rights must be embedded in the surveillance data. In modern 
surveillance any kind of personal information can be exchanged (video, location, 
relations, etc.) and can be fused in a new data type for surveillance. Hence digital 
rights for surveillance meta data are required. 

Trust of the Surveillance Subjects: For observed objects it must be comprehen- 
sible that their privacy is respected and person related data is protected. Trust in 
a system might follow irrational reasons and it cannot be said which mechanisms 
enhance trust in privacy. Openness of the system architecture and certification of 
privacy-compliance seem to be adequate solutions. Functionality of surveillance 
system must be specified exactly and must be restricted to its purpose and tech- 
nical mechanisms for privacy enforcement must be certified by a trusted instance. 
‘Control of controllers’ and control of one’s personal data collected by the systems 
will also enhance trust in the surveillance system. 

A modern surveillance system must consider all of the challenges that have been 
named above and the issues must be addressed right from the beginning — security 
by design and privacy by design. In case of surveillance both is recent research 
and innovative surveillance technology requires innovative security and privacy 
solutions. 



6 A Framework for Privacy Enforcement 

The Fair Information Principles are the minimum privacy requirements for the 
processing of person related data. Following a framework for privacy enforcement 
is presented that is compliant with the FIP. 



6.1 Fair Information Principles 

The Guidelines on the Protection of Privacy and Transborder Flows of Per- 
sonal Data serve as a rule for the EU directives on data protection (95/46/EC, 
2002/58/EC), which must be enforced by the member states. The guidelines have 
been published by the OECD in 1980. Even if the legal situation concerning pri- 
vacy and data protection should be the same throughout the EU, surveillance and 
data protection is handled differently in any state. The legal status in the US is also 
different. The Guidelines contain eight principles for privacy (see below), which 
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should be considered by any legislation. Due to the inhomogeneous law, these 
principles can be considered as minimum requirements for surveillance systems. 
Solutions that enforce privacy must deal with all principles, but must be flexible 
enough to adapt privacy according to future (legal and sociological) requirements. 

(PI) Data Collection Limitation Principle: There should be limits to the col- 
lection of personal data and any such data should be obtained by lawful and fair 
means and, where appropriate, with the knowledge or consent of the data subject. 

(P2) Data Quality Principle: Personal data should be relevant to the purposes for 
which they are to be used, should be accurate, complete and kept up-to-date. 

(P3) Purpose Specification Principle: The purposes for which personal data are 
collected should be specified not later than at the time of data collection. 

(P4) Use Limitation Principle: Personal data should not be disclosed made avail- 
able or otherwise used for purposes other than those specified in accordance with 
P3. Excep with consent of the data subject, or by the authority of law. 

(P5) Security Safeguard Principle: Personal data should be protected by rea- 
sonable security safeguards against such risks as loss or unauthorized access, 
destruction, use, modification or disclosure of data. 

(P6) Openness Principle: There should be a general policy of openness about 
developments, practices and policies with respect to personal data. 

(P7) Individual Participation Principle: An individual should have the right to 
obtain confirmation of whether or not data relating to him has been collected. To 
challenge data relating to him and, if the challenge is successful to have the data 
erased, rectified, completed or amended. 

(P8) Accountability Principle: A data controller should be accountable for 
complying with measures which give effect to the principles stated above. 

In the following sections the author explains, how the eight principals (P1-P8) are 
achieved by a framework for privacy management and a task-oriented approach. 
Security and privacy are closely related (see P5) and besides these privacy re- 
quirements, security challenges exist that must be addressed by smart surveillance 
systems. 

6.2 The NEST Architecture 

In the smart surveillance architecture NEST an operator specifies surveillance 
tasks. He must not observe a great number of monitors and other sensors, the 
system notifies him about events concerning his tasks ( management by exception). 
NEST is a Service Oriented Architecture (SOA) and the operator can create any 
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surveillance task by composing services, e. g. path finding or person tracking ser- 
vices. Due to the flexible SOA design different sensor types can be integrated and 
a huge amount of data sources can be linked into the system. Any information 
is stored in the central Object Oriented World Model (OOWM) HBEVB09II . it is 
extracted from the sensors and fused on a higher level of abstraction. Hence the 
OOWM is a good starting point to establish privacy and security. 

Another unique characteristic of the NEST architecture is plug and protect. If a 
sensor is plugged into the network, it automatically registers in the surveillance de- 
ployment and transmits its configuration details. In an open-world scenario many 
sensors are not known beforehand, authenticity of sensors and trust in the corre- 
sponding data is challenge. In NEST a web of trust for surveillance sensors is 
established to build trust in the deployed sensors. In the NEST framework fo- 
cus is on the key challenges that match the unique characteristics of the NEST 
architecture — privacy and data protection via the Object Oriented World Model, 
SOA security for surveillance deployments and plug and protect. In the follow- 
ing the approaches for privacy enhancement and trust in surveillance sensors are 
presented. 



6.3 Task-Oriented Approach for Privacy Enforcement 



In a task-oriented surveillance system the usage of a resource and each processing 
step are assigned to a concrete surveillance task. The approach has two great ad- 
vantages. On the one hand resources can be used more efficiently and on the other 
hand data can be processed in according to the FIP. For instance, if one specific 
person should be tracked in a central station. A sensor-oriented approach examines 
the entire scene including the requested person. A task-oriented approach monitors 
only the relevant person and ignores the others. It is required by law and the FIP 
(P3) that the purpose of a surveillance task is specified before the task is executed. 
If a task is specified strictly according to the purpose, a task-oriented System as 
NEST can ensure best possible privacy and data protection for the user subjects. 
As processing is task-related, person related data can be isolated in case of mul- 
tiple surveillance tasks and privacy protection mechanism can be established very 
granularly according to the requirements of the task. Hence a task-oriented system 
is efficient and privacy-aware. Essential for privacy enforcement is the OOWM 
that is encapsulated by a privacy framework — the Privacy Manager {PM), which 
is shown in Figure [O] 



120 



Hauke Vagts 




Figure 6.1: The Privacy Manager 



6.4 Framework for Task-Oriented Privacy Enforcement 

The PM enforces compliance with privacy guidelines by restricting the access to 
the OOWM according to the deployed privacy policies for guidelines and law. It 
is directly linked the World Model and hosts a Security Enforcement Sub-Module 
(. SESM ). The latter enforces the actual access controls that are derived from the 
privacy policies, performs authenticity checks and manages cryptographic keys. 
Beside the Task Management ( TM ), the Low-Level Sensor Planer ( LLSP ) is also 
essential for privacy enforcement; they are connected via the application and low- 
level bus, respectively. The framework contains modules for anonymization, iden- 
tity management and user interaction (erasure or correction of personal data) and 
a policy repository. If data is exchanged with other surveillance systems the PM 
attaches digitals rights to ensure that information can only be used for a specific 
task. All components in the PM are geared to task -orientation and enforce privacy 
according to the FIP. 

Privacy Enforcement Controller (PEC): The Privacy Enforcement Controller 
is the central interface; it receives and processes data requests from high-level 
services and controls all privacy-related modules (achievement of the FIP is shown 
below). Request are handled according to predefined privacy policies that specify 
legal guidelines. To control and minimize data collection at sensor level the PEC 
also interacts with the LLSP and the TM. The SESM is also controlled by the PEC. 

Identity Management (IdM): To guarantee privacy, the Identity Management 
performs multi-layer identity management, i. e., the IdM handles Object IDs at 
three levels: sensor level, operational level (in the OOWM) and access level (se- 
mantic level). For the latter, the IdM keeps track of surveillance tasks, and creates 
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virtual IDs to anonymize user data to apply least privilege on privacy sensitive data. 
Additionally it must be infeasible to combine information of different surveillance 
tasks, separation is also addressed by virtual IDs. Problems occur at the opera- 
tional level, if access controls change dynamically. At sensor level, it must be 
ensured that gained information is assigned to the proper objects in the OOWM. 
Problems occur, if objects are split up or fused due to misinformation. 

Anonymizer (AM): The Anonymizer is closely linked to the IdM and ensures pri- 
vacy conform access on information about objects. The AM enforces maximum 
privacy for different accesses by anonymization. If possible (depending on the 
surveillance task) location requests and attribute requests are anonymized. In most 
cases access to sensitive attributes is restricted to a subset of services of a surveil- 
lance task. In general, as less as possible information of an object should be pro- 
vided to a service. Depending on the surveillance task, imprecision or intentional 
errors can be added intentionally. 

Digital Rights Management (DRM): Task of this module is to attach digital 
rights to any information that is sent to a service or to another surveillance de- 
ployment (OOWM). This guarantees that data is only accessible during execution 
of a surveillance task or even just a subtask. Lifetime of data is restricted and data 
is only available for authorized services. However, even if the information flow 
can be controlled, services must be trusted. Once information has been observed, 
it might be reproduced and misused. 

Subject Interaction (SI): The Subject Interaction module handles the interaction 
between an observed subject and the surveillance system. The subject can re- 
quest personal data related to him and can induce correction or erasure. In some 
surveillance scenarios a subject can import his own policies. Different options for 
interaction with a surveillance system can be imagined, for instance: a personal 
device, a kiosk or simply pen and paper. 

Privacy Policies (PP): Privacy policies ensure a certain level of privacy for the 
surveillance deployment. Policies concern one or more surveillance tasks (global 
policies) or can be user specific (personal policies). Global policies are enforced 
to achieve compliance with data protection law and the FIP. By using personal 
policies the observed subject can specify a personal trade-off between functionality 
and privacy. 

Security Enforcement and Security Policies: As mentioned, security is closely 
related to privacy. However, the SESM manages cryptographic keys and certifi- 
cates, ensures authenticity of service end points and confidentially of transmitted 
data. The SESM also logs any (attempted) access to the world model. The SESM 
deploys and enforces the access controls derived from the privacy and security 
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policies. The latter specify authorizations for services and resources that are not 
privacy related. 



6.5 Achievement of the Fair Information Principles 

Principle P3 and P6, can by definition not be achieved by the PM. The purpose 
for which personal data is collected must be specified before the surveillance task 
is started. Most legislations require that the entire surveillance task (purpose) is 
specified before it is started. P6 cannot be achieved by the PM and SM as well. 
Information about the architecture, policies and operators must be easily accessible 
for surveillance subjects. 

Data Collection Limitation Principle (PI): The collection of data is firstly min- 
imized at sensor level, i. e., the sensor services only select the potentially required 
sensors for a surveillance task. As a result only potential relevant information is 
fused in the OOWM and the relation to a specific task exists right from the start. 
However, sensors can still deliver too much information for a specific surveillance 
task that is not required. Hence the AM removes irrelevant information before the 
response or event is sent back. 

Data Quality Principle (P2): Relevance of data is already achieved by the task- 
oriented approach used in PI. Data Quality is achieved by a Data Quality Module 
in the SESM, i.e., it performs integrity checks of the existing data, especially if 
data has been altered by external services. The OOWM core, more exactly, the in- 
ternal Instance Manager and the IdM are responsible for freshness and correctness 
of the instantiated objects (for details see I1BEVB09I ). 

Use Limitation Principle (P4): Use of data is restricted according to the surveil- 
lance task. Therefore accesses controls are enforced by the SM, access is granted 
to all involved services during the duration of the task, and such general Secu- 
rity Policies are stored in the SESM. To enhance privacy, more specific Privacy 
Policies can be specified that describe which attributes are accessible by particular 
services. Data should only be used in a specific context and only during execution 
of the corresponding task. Hence any information that leaves the World Model is 
coupled with digital rights. This is done by the DRM. This is especially impor- 
tant, if data is exchanged between OOWMs. A service or OOWM must have the 
valid credential to process the requested data, e. g., if a credential has expired, the 
service or OOWM cannot process information of a subject and the credential must 
be requested again. 

Security Safeguard Principle (P5): Established security mechanisms and proto- 
cols are used to achieve CIA in the NEST architecture. For instance, certificates 
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(Public Key Infrastructures) or IPSec. Although, these methods are sufficient, 
more specific security mechanisms would enhance efficiency. In Section [7] a web 
of trust for surveillance sensors is proposed to enhance trust in the authenticity of 
surveillance sensors. Hence only sensor services that are assumed to be trusted are 
allowed to deliver information into the OOWM. 

Individual Participation Principle (P7): Besides the general privacy policies 
mentioned above (P4), data subjects can also specify personal privacy policies, 
i. e., a data subject can find his personal trade-off between efficiency and privacy. 
Naturally not all surveillance tasks (e.g., thievery protection) allow personaliza- 
tion. The SI handles interaction between the surveillance subjects and the surveil- 
lance system and empowers user subjects to request the personal data related to 
them. They can induce erasure (if it is compliant with the surveillance task) or 
correction of their personal data. 

Accounting principle (P8): Any services performed by a module inside the 
OOWM, any external access by an actor and any data integration by a sensor is 
logged. If, for some reason, a violation of access rules occurs, the operator is noti- 
fied about it. These logs cannot be altered by the operator. Hence they can be used 
to proof proper processing of personal data. 



7 Web of Trust for Smart Surveillance Sensors 

One challenge for modern surveillance systems is the establishment of trust in 
unknown partners and sensors, even the authenticity of self-introduced sensors 
cannot ensured. Potential surveillance partners might not trust each other and es- 
tablishment of a common root CA can be difficult. Hence, we propose a web of 
trust for building trust into surveillance sensors. The idea of a web of trust has 
been used in PGF0 to establish trust in digital signatures. In the case of surveil- 
lance, a web of trust can be used to assign trust to known surveillance operators, 
which is used to calculate authenticity of sensors. A surveillance system operator 
A collects public keys of other operators (parties) in his public key ring (K^ ub ) 
and sets the owner trust for other surveillance system operators. He can set the 
trust to complete, if he has full confidence in an operator B or to marginal in case 
of marginal trust. If A has given complete trust to B, he considers any sensor dig- 
itally signed by B to be authentic. In case of marginal trust, any sensor sb signed 
by B is partly trusted, i. e. information gained by sg is weighted less authentic. 
Another party C that is marginally trusted must sign sg to achieve full authentic- 
ity of sg. A sensor s is authentic, if A(s) = pp + > 1, a;(s) denotes the 

2 http://www.ietf.org/rfc/rfc4880.txt 
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Figure 7.1: Web of Trust for Surveillance Sensors 



marginal trusts and y(s) denotes the complete trusts for s. X and Y denote the 
number of required trusts to achieve authenticity. If A(s) < 1, data received from 
s is not considered for surveillance tasks. As in PGP X = 2 and Y = 1 seem to 
be reasonable, but a surveillance operator can choose more restricted values. Op- 
tionally the information gained from s, can be weighted according to A(s). A can 
also establish direct trust in s by signing the sensor directly. To extend his web of 
trust A can specify trusted introducers. If B is a trusted introducer of A, A trusts 
all operators hi, i 6 {1, . . . , n}, trusted by B (marginal or complete). Figure 7. 1 
shows a web of trust with six surveillance parties (A-F). A has self-signed his 
own three sensors and one sensor run by C, hence they are all completely authen- 
tic. A has complete trust in B, resulting the sensor signed by B is also completely 
of authenticity. A marginally trusts C and I). hence in each domain one sensor is 
marginally authentic. The cumulative trust in C and D results in complete authen- 
ticity of one of the sensors operated by E. The other sensor is only signed by E, 
A does not know E, hence this sensor is not authentic. B is a trusted introducer of 
A, thus the sensor operated by F is completely authentic. 

However, the major issue of a web of trust is that operators could carelessly trust 
in other operators to get a huge network of authentic sensors, instead of having a 
smaller but trastable sensor network. The authenticity that is achieved by a web 
of trust reflects the trust in a sensor by the view of a system operator, it is not 
a qualified signature, i. e. data gained by trusted sensors must be authenticated 
additionally to guarantee that it can be used in court. 



8 Conclusion and Future Work 

The proposed Privacy Manager can be used in task-oriented surveillance systems. 
Due to its flexibility it can be adapted to any privacy requirements. Future legal 
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and social requirements must be specified and the approach must be discussed 
between the involved parties. Similar, the web of trust for surveillance sensors can 
(technologically) solve a security challenge, but it must be discussed with lawyers 
and system operators have to discuss its practicability. 

Nevertheless a lot of research must be done in the area of privacy and security 
for surveillance systems. New anonymization techniques must be developed and 
explored in practice. New approaches for privacy polices are also necessary. Ex- 
isting languages do not fit surveillance circumstances and do not provide sufficient 
mechanisms for automated derivation and flexible deployment. To enhance the 
web of trust, degrees of believe will be combined with authenticity mechanisms to 
couple trust, sensor signal quality and probabilities. 
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Abstract: Analysis of data that are derived from advanced image acquisition 
techniques, gains more influence in the image processing industry. One ma- 
jor issue in the visual inspection in the industry is the detection of anomalies. 
Rather than detecting anomalies by describing them by features or detect- 
ing them by explicitly describing the non-anomaly case, the auto-regressive 
models provide a way to eliminate expected pattern and emphasize the not 
expected — the anomalies. 

This paper introduces a new class of auto-regressive (AR) models that can 
handle data which contain different kinds of modalities, where the modal- 
ities represent different aspects of the inspected surface. The theoretical 
background of the AR models are presented, explained and analyzed. 



1 Introduction 

Recent developments in data acquisition, processing and not at least increased 
computational power, make it feasible and applicable to acquire and process more 
information of a specimen under examination. These additional information may 
provide additional clues for detection, recognition and identification of anomalies 
and structures in general. The original and additional information are refered here 
as multi-modal data. 

The necessity for archiving data about the quality assess, reproducable objective 
decisions concerning the quality assess and the mere amount of products to as- 
sess are coercive reasons to assist man in this task. In the task of automated vi- 
sual inspection, the detection could be simply accomplished in regions free of any 
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structures. If the region is cluttered be structure, advanced methods are required 
instead. 

One methology is to build a vector containing features that are derived from an 
image. These feature-vectors are classified for distinguishing regions that contains 
only expected structures and regions containing anomalies. Another methology 
is to model the expected structures, apply this knowledge and reduce the com- 
plexity of the remaining task. The auto-regressive (AR) models considered in this 
report allow modeling the expected structures by using statistical methods. With 
these models it is possible to capture the expectations (spatial as well as other in- 
formation) in a statistically optimal way and render this knowledge useful in the 
inspection. 



1.1 Contributions 

The novelty of this paper are two flexible auto-regressive models that are adapted 
to multi-modal data. The combination of the ability of AR models to be adapt- 
able to any pattern and multi-modal data, which can be obtained by emerging 
new technologies in image acquisition and processing, allow a robust and reliable 
emphazing of non-expected artifacts. 

With the multi-modal AR models introduced in this report, the detection of struc- 
tures is even possible, when the information required for detection is spread over 
different modalities and is only substantial in combination. The usage of the max- 
imum accessible amount of information, enable the potentiality of a very reliable 
anomaly detection even in regions coated with other structures. 



1.2 Related Work 

Many procedures that are studied today and are deployed in the industry generates 
feature vectors that are adapted to the surface and structures currently examined. 
These vectors may be build from different feature generating processes. Each 
vector is subsequently classified for indicating a region that contain an anomaly or 
not. 

I1KP021 use Gabor wavelet features for the description of regions. Suitable Gabor 
filters are selected by comparing defect-free and defect regions, a simple thresh- 
olding is used for distinguishing these regions. llZPYOll use specially designed 
wavelets for detection of defects on fabric. In I1PP01I . the outlier detection is 
conducted by using robust covariance matrices. 
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In the domain of auto-regressive models, the VAR (vector auto-regressive) model 
and VARMA (VAR moving average) model are the concepts closest to the multi- 
modal AR models proposed here. However, they are restricted to the one- 
dimensional case and are therefore useless in the process of detecting anomalies 
in multi-modal image data. Furthermore, they are not easily extensible to a high 
dimensional configuration. 



2 Definition of Multi-Modal Data 



Multi-modal contain on a baseline of understanding different kinds of informa- 
tion for a single location. In theory, there are no further requirements such as 
continuousity or equally spacing on the information if the data is discretized. 

To derive a more manageable and appropriate data-structure for our case, we im- 
pose one additional constraint. We refer to data as multi-modal if for each spa- 
tial location (two-dimensional in image based data and three-dimensional in vol- 
ume based data) serveral addition scalar information (identical in number) exist. 
Furthermore, we require that this information is equally spaced. 

The structure of each location in a multi-modal dataset may be expressed as shown 
in Table I2TT1 



Spatial system 



Element at each location in the multi-modal data x 



2-dimensional (images, . . . ) 



.s-dimensional 



/ % m ,n ,c= 1 
— I 2- m ,n , c= 2 



' %i,c= 1 



i e 



' Gray value at x, t/ 1 

Surface inclination at x, y 



is used to index elements 



Table 2.1: Examples of the structure of multi-modal datasets 

As we can see, it is posible and mathematical feasible to express a multi-modal 
dataset as a tensoiQx 6 R sxd with the dimensions s for the spatial extent and 
d as the extent in numbers of different modalities. The first s dimensions which 
are most commonly associated with space, are primal for the multi-modal data. 
Everthing is aligned to it. 

1 A tensor is a generalized concept of matrices. A tensor of order n consists of several scalar elemets, 
each of its scalar elements is accessible be using an index i € N” . A 2D matrix would be a tensor of 
order 2. The values contained in a tensor are sometimes called Muxel IPMGC09I . 
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On a broader understanding: multi-modal data does not rely on any spatial ref- 
erence as primal reference. Other referencing systems are possible and in some 
cases appropriate. It is solely application dependent and does not inflict with the 
further mathematical modeling of the multi-modal auto-regressive models. The 
single necessity for the remainder of this paper is, that multi-modal data have to 
be equally spaced — as shown in Table 2.1 



Weak Stationarity 

The definition of the AR models as whitening filters provide the clue for moni- 
toring their adaptation. Indirectly the property of weak stationarity in input data 
ensures the applicability of the models. 



Weak stationarity is a quality of data and derives from the theoretical modeling of 
an image source as a stochastic process. Each multi-modal image a: is a prototype 
function of this stochastic process. It states two properties that ensures that the 
data is statistical independent from locations up to the second statistical moment. 



E{a: a } = E{a: a+ d} 



Vd if a, a + d are locations within x 



Cov{x a , x aJr d\ = Cov{xb, xt, + d\ Vd if a, 6, a + d, b + d are locations within x 



These two equations ensure, that the expectation value at a given location is equal 
in the concerning region and the statistical spatial relation within the region is also 
independent from the location. 

The property of weak stationarity ensures that a model which has been build in 
only one small region is feasible on the whole data. To make assertionts about the 
stochastic process itself, e.g., pattern-analysis, the property of weak stationarity 
has to be fulfilled as well as the ergodic hypothesis, nonetheless the ergodicity is 
not required for the presented results in this paper. 



3 Multi-Modal Auto-Regressive Models 

The idea behind the usage of auto-regressive models is the explicit adaption of ex- 
pected structures and subsequently statistical testing. The conducted test is called 
Null-Hypothesis-Test (ido-Test). It verifies whether the model is valid for the re- 
gion it is applied or not. If the model is still valid, the region tested is sufficiently 
close to the region the model was created for. It is therefore an indicator for the 
abidance of expectation. 
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(a) 3D/1 (b) 2D/d 



Figure 3.1: This sketch visualize both multi-modal auto-regressive models. The 
AR coefficients are placed according to their application on the multi-modal data. 
Same colors indicates same weight or factor. 



All AR models are based on the same principle. They are linear, translation- 
invariant filters and are calculated using a convolution. The estimation of model 
parameters and filter coefficient is in every model a crucial part and discussed in 
detail in the following sections. AR models use only a finite number of filter co- 
efficients. They count therefore to the FIR filters (finite impulse responses) and 
are subject to the issue of stability. The stability of the filters is discussed in the 
following sections dealing with the process of estimating the filter coefficients. 

The following two novel AR models are introduced. The 3D/1 AR model use sta- 
tistical information from the multi-modal data in each modality simultaneously for 
adapting a model to the presented texture. The 2D/d AR model impose a simplifi- 
cation, which takes advantage of a special constrain on the statisical relation in the 
data. The simplification results in a less complicated optimization problem, faster 
calculation and an easier parameterization. Both models are depicted in Fig. H3 



3.1 U, U c , £ and Point of Reference 



The coefficient model mask U or U c 

The coefficient model mask U defines the supporting area which is used for the 
calculations that lead to the prediction of the point of reference. The mask has 
great influence on the quality of the prediction and is used in the prediction (see 
Eqn. (3.1 1 and (3.4 1 ) as well as in estimation (see Eqn. (3.3 i and |33}). U c defines 
the data elements that are to be used and may be interpretated as index set. 



The design of the mask has to take the statistical properties of the structure into 
account. This means, that the mask should be designed in such a way that the 
statistical properties of the data are captured sufficiently. 
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Figure 3.2: The multi-modal image in this paper consists of three axes. The tradi- 
tional two spatial ones and one axis that indicates the modal dimension. This figure 
shows a coefficient model mask U c of 7 x 5 x 4 size with the point of reference in 
c = 1 and rightmost in U r . 
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Figure 3.3: The interplay of coefficient model mask, point of reference and the 
resulting vector x that is constructed according to Eqn. (3.2). 



The AR models are used in a predictive manner. For a true prediction without any 
influence on the value to be predicted, the coefficient model mask may not include 
the point of reference. 

A few considerations about the general design of the coefficient model mask be- 
sides the statistical ones are also worth mentioning: if the coefficient model mask 
is recursive computable, the associated AR model could be used for texture syn- 
thesis. A special case of recursive computable coefficient model masks are the 
causal coefficient model masks that trace back to the ages of the origin of the AR 
models when they were mostly used in time series analysis. 

The estimation region £ 

The filter coefficients are commonly called AR coefficients a and are estimated 
using the coeffiencient mode! mask U. the estimation region £ and a special al- 
gorithm for determining the coefficient. The estimation region £ is in most cases 
restricted to one image. But this restriction is neither necessary nor is it feasible 
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Figure 3.4: Figure (a) visualize the difference between the region that is used 
for collecting information (estimation region £ (green)) and the region that is pre- 
dicted (light blue). Figures (b), (c), (e) and (f) visualize different possible coef- 
ficient model masks (yellow) and points of predictions (red). Figure (d) and (g) 
shows possible estimation regions that might be appropriate for training the model 
so that only valid or trustworthy image regions are included. 



for each case. If the estimation region is spread over several images, the stochastic 
process is captured more directly as it would be if the ergodic hypothesis is used. 

Point of Reference 

The AR models are used to predict the point of reference. This value is calculated 
by using a surrounding and specially optimized AR coefficients c a. The surround- 
ing is build according to the coefficient model mask U r . This predicted value is 
greatly influenced by the AR coefficients c a and the coefficient model mask U, as 
well as the estimation region £. The predicted value plays an important role in the 
distinction of regions that contain anomalies and regions that did not contain any. 



3.2 3D/1 

The modalities are treated as ordinary data dimensions exactly like the spatials 
ones (see Fig. 3.2). This implies, that the pre-condition of AR models must hold 
and eventually enforced on the whole input data across all modalities. 

The naming 3D/J is based on the organisation of the coefficient model mask and 
dimension of the prediction. 3D stands for two spatial and one modal dimension 
which renders the organisation of the coefficient model mask a three-dimensional 
artifact. // stands for scalar value predicted by this model at the point of reference. 
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Fig. [3, l[ a) visualizes how the coefficients are aligned in the coefficient model 
mask. 

The result of the prediction of the AR model on the data has to be weak stationary. 
From this requirement by definition follows that the input data x must also be 
weak stationary (see Sec. [2j. Noise w m n c ~ ,/V"(0, 1) is assumed to be normal, 
uncorrelated and additive to the image. 

After these prelimaries, the elementary mathematical structure of the 3D/1 AR 
models are defined as follows. Keep in mind, that the 3D/1 AR models are 
dedicated for the prediction of a scalar value only even in the multi-modal setup. 

m, n are used for spatial coordinates and c for the modal coordinate, k, l are used 
as control variables for spatial coordinates and i for the modal coordinate. 



^ ' Q j k,l,i''Km—k,n—l,i QL X m n (3*1) 

(fc,Z,i)£L/ c 

%m,n,c = &m,n,c H - = QL X m n ~^~ ® W m,n,c 



U c is the coefficient model mask which is specific for channel c (Fig. |3.2fr . It is 
possible and in some cases beneficial to use a different coefficient model mask U c 
for each channel, i.e., U c= \ ^ U c= 2 7 ^ ' ' ' • But in most cases, good results are 
achieved by using equal coefficient model masks for all channels with only small 
modifications to account for the causality condition. 

The difference of the actual value at the point of reference x m ^ ntC and its prediction 
x m rl c is called prediction error e m n c and is calculated according to: 



Xrj 



c a'V 

— —m,n 



/T, / 



= Q X 



x' T C 9L ■ 



m.n—m.n 



Here c a, c a' represents the AR coefficients and x' are the data from a 
surrounding of a given location are defined according to 



ao, 1,1 



“a = | c a 0) i ,2 



-1 

c a 



y = 

—m.n 



%m,n— 1 , 1 ^ 
^m,n— 2,1 

^771,71 — 1,2 



X — 



y 

—m.n 



\ ••• / 

y as well as x' the coefficient model mask IA C into account. Its creation is 



deptied in Fig. 3.3 For determination of a 3D/1 model that is optimally adapted to 



the multi-modal texture in the estimation region £ on the specimen, the prediction 
error has to be minimized. This minimization can be performed in different ways. 
A closed-from solution is proposed here that also provides the advantage that the 
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result is automatically stable I1DM84II in BIBO terms (bounded input - bounded 
output); at least with data similar to those in £. 

The structure of the 3D/1 models allows not only a simple minimization of the 
prediction error, but also minimizing the variance of the predication error which 
yields much better results. 

T = Var{e} = E{e 2 } — E{e} 2 = E{e 2 } — ¥ min (3.2) 



The optimization problem C T is minimal and optimal in the least square sense 
over the estimation region £ if the AR coefficients c a are optimally determined. 
In Eqn. (3.2 1 the quality of an optimal approximation is used by assuming the 
squared prediction error is zero E{e} 2 = 0. 



The weak stationarity of e (used in Eqn. ( |3.2) ) is needed to provide the statistical 
justification to prefer the prediction error from only one prototype function (image 
x ) over all other possible realisations of the stochastic process. This methology of 
using only one prototype function instead of the whole stochastic process is called 
erodic hypothesis. 



One special assumption of the 3D/1 AR models is the space filling coefficent model 
mask £ in the modal direction c. Therefore the limits of the summation within C T 
varies only over two spatial componentes (m, n ) G £. 



Y2 x m,n,c S x m,n,cX. 



c r = c a ,T 



m,n,c 
m.n m.n 



m,n m,n 

= "Yh X m,n,c - 2 a ^ Km,n Xm ’ 




For the minimization of c r' which yields also a minimal T, it is necessary to 
derivative C T ; with respect to the only uncommitted variable c a. To be truly min- 
imal the Hessian has to be positive definite which in pratice is always the case 
according to llMak75l . 



d'T 

d c a 



— y a; m „ c + 2y^ y y T c a = 0 

/ > n rn,n,c Am,nAm,n — 



m,n 

-1 



y v v t y x 

Z—/ — i m.n— m.n I Z— / —n 









(3.3) 



c a solves the optimization problem C T and forms the coefficients describing the 
spatial information of the texture. The global information of the texture C cr 2 is 
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c=l c=2 c=3 c=4 c=5 c=d 




cov{c=l,c=d} 

cov{c=2,c=d} 

cov{c=3,c=d} 

cov{c=4,c=d} 

cov{c=5,c=d} 

cov{c=d,c=d} 



Figure 3.5: This figure shows the block structure of XX T if X is defined accord- 
ingly. The modalities are plotted horizontally and vertically, each block indicate a 
correlation between two image regions origined from the labeled modalities. 



obtainable by evaluating a simplified C I\ The function c(£,U c ) € N indicates the 
number of the coefficient model masks that could be placed within £. 



N 




(ra,n)££ 






d • 3D/1 

As we can see from the structure of the 3D/1 model, they can be easily expanded 
to yield not only one scalar prediction but a vector. Theoretically all parameters 
(U c and £) may be modified in this process. 

In the case of very different statistical properties between the modalities which 
could be easily identified by analysing XX T > it would be worth to identify dif- 

ferent sets of parameters especially U c or even separate the multi-modal data to be 
processed by different models. 

Classic 2D AR model 

The simple two-dimensional AR model is defined on simple scalar valued data 
that are arranged in two dimensions. If the model is interpreted as an operator on 
a special multi-modal image, the multi-modal dataset would consists of s = 2 and 
d = 1 as dimensions and all the math discussed in the previous section applies. 



The 2D(/1) models require only one coefficient model mask, hence the index c 
may be discarded from U c to U and c a to a. 

Structure of J2 XX T 

Within x the multi-modal image content is encapsulated, sliced according to the 
coefficient model mask U c . The axes are systematically varied according t o the 
definition of U c . This results in block structure of xx T that is visualised in Fig. 3.5 
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From the figure of XX T - it * s clearly visible which blocks correlate among each 
other. Each block indicates an image region for every modality. On the diagonal 
all modality blocks are correlated with themselves. 



3.3 2D/d 



The 2D/d AR models are designed with the assumption, that the characteristical 
texture is dominated by its spatial relative coordinates. This could easily be visual- 
ized in the example of stamped or embossed surfaces. In the process of stamping, 
a stamp is placed spatially on the surface any suface-surface-interaction (which 
might be force) occurs. The resulting multi-modal texture is now dominated by 
the texture imposed be the stamp. In general, this is not a strict functional rela- 
tion due to diffusion-like processes and special behaviours in the modalities. The 
2D/d AR models simplify this assumption to a functional relation. This assump- 
tion states that the interchange across the modalities is negligible compared to the 
statistical relations across the spatial domain. This leads to a coefficient model 
mask that is arrangeable in spatial dimensions. 

One special aspect of stamping or embossing are accidently caused stamps or em- 
bossments, which could by related with anomalies or defects, i.e., caused by an 
accidentally slipped screwdriver. 

The naming 2D/d is based on the organisation of the coefficient model mask and 
the dimension of the prediction. The coefficient model mask is arrangable in 2D 
and the predicted value at the point of reference is d-dimensional and contains the 
predication for all modalities simultaneously. 



^ a k,l2Lm-k 

(k,i)eu 



2±m,n — ^-r, 



+ a w r 



(3.4) 



a /T A fT 



A' a' 

m.nm.n— 



The structures used are defined as follows: 
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The optimization problem for the determination of an optimal a for the given data 
in the estimation region £ is formulated as: 




As in the previous section, c(£,U) G N is the number of possible placements of 
the coefficient model mask U within the estimation region £. T' is derived from 
r by neglecting constant factors irrelevant for the determination of the optimum. 
Hence, is given by 




The roots of = 0 yield the minimal F 7 and also the minimal T. The deter- 

mination of a from the roots of ^ yields our optimal AR coefficients in a least 
squared sence according to: 



5T 

da 

a 




(3.5) 



Structure of J2 A T A 

For emphasizing the distinction between the 3D/1 and the 2D/d models, we take 
a closer look at the properties of ^2 A T A and compare it with the corresponding 
matrix °f the 3D/1 AR model. Within the matrix A information about 

the image content in modalities (row-wise) and space (column-wise) is captured. 
Through the calculation of *22 A T A a statistical analysis of the captured data is 
done. 

For simplification, the following notation is used: A = (x 0 o -Ho l • • •) G R dx ’ 
with xJ o ii = ( 1 a 2 Xi 0t i 1 . . .) where c Xi 0 ^ 1 indicates the scalar value of the 
modality c at the location (to, i\) ( x io ^ € R dxl ). 
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d ■ 3D/1 


2D/d 


Number of AR Coefficients 


ril, m 


\U\ 


Effort of prediction of one point 


opd 2 • t m „i+ 


Opd ■ fmul + 


of reference x 7 - 


d(opd — 1) ■ t a dd 


d(op — 1) ■ fadd 



Table 3.1: This table compares the different multi-modal AR models in com- 
plexity and numerical issuses. U c is assumed to be o x p x d in size and U as 
O X p. 



0„2 0„ 0„ 
^0,0 2^0,0 2 ?o,l 

a T a | 0~, 0™ 0 ™2 

— I ^0,0 ®0,1 X 0 i 



lx 0,0 1 2r 0 ,o 1 2r 0 ,i 

^o.o^o,! 1 2:0,1 



A c=0 AI =0 =Cov{A c= 0 ,A c=0 } 



A c=1 AJ =1 =Cov{A c= 1 ,A c=1 } 



^ Cov {A c=i , A C= J 



(3.6) 



The vector A c=i £ R rfxl represents all information of one modality across all 
spatial location within U, this is similar to X- In Equation (3.6 I it is clearly visi- 
ble, that A t A is the sum of covariance matrices for each spatial relation in each 
modality. Through this equation we can easily see, that the 2D/d models are robust 
against weak correlation between modalities and therefore preferable in situations 
where the modalities in multi-modal data are weakly correlated. These models 
nevertheless enforce the interrelation between the modalities. 

Table [3TT|provides a short comparison between the d ■ 3D/1 and 2D/d AR models. 



4 Inspection Framework 

Input 

As stated before, the data processed by an AR model has to be weak stationary. 
The reasons and exact preconditions are stated in section [2] The first step is to 
check for weak stationarity and eventually enforcing it. It could be enforced in 
different ways, e.g., removing prevouisly known structures that destroy the con- 
dition. IBL98H propose a method, that is able to remove random lines from an 
image. 

Structual output 

For compability reasons, the two multi-modal AR models have to produce the 
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Input 



Processing 



Classification 




Figure 4.1: Scheme of the framework described in Section [4] 



same structural data as result of their filtering. The 3D/1 is not compatible with the 
2D/d models, as the 3D/1 models predict only a scalar value, while 2D/d predict a 
vector. To derive a compatible output, the 3D/1 is repeated d times with appropiate 
modifications such as that each model is responsable for the prediction of only one 
modality. The resulting model is called d-3D/l. Subsequently the d-3D/l must 
have d different coefficient model masks U c and points of reference. 

One simple method to achieve this, is building a special series of coefficient model 
masks that do not take any data on the non - primal dimension of the point of ref- 
erence (see Fig. |3.2) into account. The resulting coefficient model masks are 
therefore the same. Only the point of reference, which also have to be modified, 
differs between those models. Nonetheless, if the coefficient model masks are 
equal IA C = i = U c= 2 = . . . , the x-vectors of each model is equal. 

Classification 

After the structure of the output is comparable, the output has to be classi- 
fied to yield an indicator that qualify a region whether it is free of anomalies 
or not. The literature name many different methods to achieve this distinction. 
For instance neuronal networks, support-vector-machine or support-vector-data- 
description iTaxOll . 

In |TQD86| a method named CFAR (Constant False Alarm Rate) is proposed that 
squares the prediction error, divide it by the local variance and apply a threshold 
to the resulting value. The threshold is derived from a region known to be free of 
anomalies. 



A different approach employs the ability of more advanced classifiers for distin- 
guishing not only one single feature vector but also combinations of feature vec- 
tors. The output of the AR filtering may also be combined with other feature 
generating filters or the AR filter output is used as input for feature generating 
processes. 



The framework discussed here is depicted in Fig. 



4.1 
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Figure 5.1: This figure shows the ^ XX T from real world data. The different 
correlations within one modality and between them is clearly visible in the figure. 
Brigther color indicates a better correlation. The yellow line on the diagonal visu- 
alize the autovariance. We can easily see, that the correlative information between 
the first five modalities are not negligible. 



5 Experimental Results 



The experiments were conducted on a small part of a cylinder barrel of a combus- 
tion engine. The multi-modal data were generated by using a variation of deflec- 
tor^ HWMHB09II provide a complete introduction into deflectometry including 
some insights into the problems and geometric reconstruction. The multi-modal 
data derived here contain nine modalites. 



The coefficient model mask U c used for the calculation of xx is enfigured 

It shows that not all modalities 



in Fig. 



3.3 



5.1 



XX T i s visualized in Fig. 
correlate well with each other. In fact the figure shows that the autocorrelation in 
weakly correlated modalities is smaller than the autocorrelation of modalities that 
correlate well with some other modalities. 



The d- 3D/1 AR models can cope with these data. Nevertheless the optimization 
problem has high dimension and may be numerically unstable. The 2D/d AR 
models takes advantage of the weakly correlated blocks shown in Fig. EH but 
they discard information in non-autocorrelated blocks. The data may be split with 
respect to the statistical nature and processed separately optimaly in each fragment 
by one d ■ 3D/1 -Model or 2D/d AR model. The visible difference in the prediction 
error of both models are negligible. 



2 Deflectometry is done on highly specular surfaces. Typical information gained from deflectometric 
measurements are for example: synthetic grayvalue and hints for the surface inclination. 
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6 Conclusion and Future Work 

We presented two AR models that are specially designed for multi-modal setups, 
which gain more influence in the visual inspection. In the automated visual in- 
spection as well as in every image based algorithm, regions which are known to 
be less trustworthy has to be explicitly respected in the calculation. Nonetheless, 
methods that indicates which model has to be used, would be helpful. 
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Abstract: The main principle of shape from specular surface acquisition 
is to use a highly controllable environment, where a screen on which a 
well-defined pattern is presented is observed via the specular reflecting sur- 
face. Knowing that pattern, it is possible — at least with certain additional 
knowledge — to reconstruct the surface under test. In this paper, we discuss 
two aspects of this principle: first, a new iterative algorithm for shape recon- 
struction using surface normal data is introduced, and second, some rules of 
thumb for the experimental design of inspection systems are derived from the 
investigation of the normal field induced by measurement. 



1 Introduction 

Let us consider our knowledge about object surfaces from a technical and visual 
perspective. There are two aspects we can distinguish: the reflectance and the 
shape. 

Once the bidirectional reflectance distribution function (BRDF) for every point 
of the surface is known, all information about the reflectance properties of the 
surface is available. The BRDF as a function of the geometric arrangements of the 
illumination and the observation relative to the surface normal. It describes how 
bright the surface will appear in proportion to a given irradiance. Many types of 
automated visual inspection methods for industrial surfaces employ the knowledge 
of the BRDF implicitly. 
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Other surface properties that can be evaluated are geometric properties, which ul- 
timately describe the object’s shape. The knowledge about this aspect is usually 
represented through an object model. The simplest form of such a model is a 3D 
point cloud, which consists of the raw data,i.e., 3D points of the objects’s surface, 
that are obtained directly through the measurement process. The problem of find- 
ing a more appropriate model for given raw data is one of the fundamental prob- 
lems for the reconstruction of 3D objects in computer graphics llRem03l Hop94) . 
In the context of industrial automated visual inspection systems, we can stress at 
least two inspection tasks considering object shape — first: how well fits the global 
geometry of the object under test into its designed shape, and second: is there a 
local object deformation. 



How do these aspects appear in the context of the considered inspection task, the 
inspection of specular surfaces? The at least partially specular nature of the ob- 
jects under test implies the validity of the law of geometric optic reflection. This 
knowledge is the only assumptions we will take into account. Hence, for specular 
surfaces, the BRDF is well known. For partially specular surfaces, it is possible to 
model the reflection through a specular and a diffuse component. For many types 
of practically relevant surfaces, it is sufficient for the automated visual inspec- 
tion to employ the well known Phong shading model, which assumes a perfectly 
diffuse reflection component and a specular component that decays polynomially 
from the ideal specular direction llPho75l . In either case, it is assumed that it is 
possible to determine the direction of the specular reflection. 



For the shape from specular reflection problem, we will assume that the reflectance 
properties of the objects under test are known and we will focus on the determi- 
nation of the object shape. The main challenge in the field of automated visual 
inspection of specular surfaces can be stated as follows: how can we gather infor- 
mation about a surface only by assuming its specular property? With deflectome- 
try we denote the family of such methods, evaluating reflected images of a priori 
known patterns, for specular shape information retrieval. According to this defi- 
nition, shape from specular reflection is a special case of deflectometry, focusing 
only on 3D-shape reconstruction, whereas other aspects of deflectometry, like the 
estimation of surface deviations, are ignored. 

The basic deflectometric principle can be described as follows: a presumably dis- 
torted (deflected) image of a well known and calibrated scene or light source L is 
captured with an image acquisition device C such that the light path includes the 
unknown specular surface S, cf. Figure Knowing the intrinsic and extrinsic 
parameters of the camera, the light source, and the image acquisition constellation, 
it is possible to obtain normals of the unknown surface. Thereby the mapping l r 
from sight ray to scene point — the deflectometric measurement — is usually done 
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Figure 1.1: General geometric setup for deflectometric inspection. 



by a unique coding of the scene positions. It is well known that the shape from 
specular reflection problem for such a simple setup is mathematically ill-posed and 
additional knowledge for the regularization of this problem is required llBal08ll . 



1.1 Contributions and Structure 



There are two main contributions, which structure the report at hand: 



1. A novel approach for the reconstruction problem for specular surfaces is 
given in Section [2] Solving a nonlinear Poisson equation iteratively, by 
means of finite element methods, yields a robust and fast converging method 
for industrial inspection tasks. 

2. It is well known, that observing specular surfaces in the deflectometric man- 
ner yields to normal-fields llBal08l . Examing those fields is fruitful for the 
experimental design of specular inspection systems. The answer to the ques- 
tion, where to place the camera and the screen in such a setup, is given in 
Section [3] 
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2 Shape from Specular Reflection 



Because of the assumption that the reflection law holds for each surface point, the 
following relation 

s s r 



11*11 K 




between the sight ray s to the surface S, the reflected ray s r , and the local sur- 
face normal n holds (vectors of unit length are marked by an additional hat, 
i.e.||x|| = 1). With s r = l — s (see Figure [TT| , from equation [5] follows 
the relation 



2.1 



between possible surface normals due to the measurement 
and the measurement l(u) itself, with u £ U C R 2 for all a: £ ft with 
{a; | a: £ R 3 A P(x) C U }: 



ft 



,(*) = X — l — X = 77 TT 



l(P(x)) — X 
||Z(P(*)) - X 



= : m(x 1 l(P(x)). 



( 2 . 1 ) 



Here P : R 3 — >• R 2 denotes the projection P{x) := f|) T , x = 

(xi,X 2 ,xs) t , u points in the image plane of the camera, and ft describes the 
cameras sight cone. 

Note that for all x £ S and for all undisturbed measurement. 



h(x) = h m (x) (2.2) 

must hold. For all x £ tt\S, n m (x) is a possible surface normal to a hypothetical 
surface S in the sense that S would lead to the same deflectometric measurement 
l(u) as S, cf. Figure [2] 

We can summarize: the deflectometric measurement establishes a normal field 
h m (x) so that the shape from specular reflection problem reads as: find the very 
surface which fits into this measured normal field. 



2.1 Shape Reconstruction for Surface Graph Representations 



The surface S can be described as graph of a function / : ( x , y ) — >■ f(x , y), R 2 D 
ft xy i-A R using the parametrization 



S = {{x,y,z) T \z = f(x,y)}, n 



1 

V(9 x f) 2 + (<V) 2= + T 



-9 x f\ 

~ d vf 



The induced normal field 



(2.3) 
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Figure 2.1: Example of solution manifold: hypothetical surfaces S (solid lines) 
and real surface S (dashed line) are presented as family of function graphs f x . 



For each surface point. Equation flUt must hold. This leads to the nonlinear 
deflectometric partial differential equation (PDE) 



-V/(x,y) 



/ ^in,l / ^m,3 
\ tl m ,2 /l^m,3 ) 



q(x,y,f) 



( 9i (x,y,f(x,y))\ 
\q 2 (x,y,f(x,y)) J ' 



(2.4) 



Many deflectometric reconstruction approaches use a linear variant of this equa- 
tion, see for example Massig iMasOll . which implicitly leeds to some regulariza- 
tion for selecting the correct normals n s to the real surface out of the normal field 

«mEI 



C : (X, y) T 1-4 n 



E 2 -4 R 3 . 



t s (a:o,2/o) 6 {h m (x,y,z) \ x = x 0 ,y = y 0 } . 



2 Selecting the correct normals in this sense is commonly done by a stereo approach IKKH0411PT04I . 

which therefore can be thought of as a linearizion process due to the elemination of the f(x,y) 
dependency of the projected normal field q(x, y , /) . 
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With the surface representation of Equation ( |2.3| t, this mapping yields the linear 
variant of problem according to 

— V/(x, y) = q(x,y ) . 

Here we have to point out, that first selecting the normals fi s is not sufficient for 
reconstructing the unknown surface, since we need initial and/or border values to 
solve the reconstruction problem BWBB07I . Furthermore, the normal field is not 
necessarily curl free, which implies that a potential / might not exist and only an 
approximative solution can be obtained. 

In HWMHB09I the author of this report gives a short overview on commonly used 
approaches for normal field integration methods in the context of deflectometry. 

It is possible to directly utilize the nonlinear problem S which can be 
transformed into a scalar PDE by applying the divergence operator according to 

-Af(x,y) = divq(x,y,f) . (2.5) 

In the following, an iterative solution approach for this equation is presented. For 
common inspection problems one can observe that the right hand side of equa- 
tion ( |2.5[ > shows a weak dependancy on f(x,y). In Section[3]the vector gradient 
of the normal field in a given direction is further analyzed. To eliminate the de- 
pendancy on f(x, y) in the divergence term, one can choose an estimate for the 
surface. This can be done in several ways. First: model driven, because in in- 
dustrial inspection tasks we always know the object we are inspecting and one is 
mainly interested in shape deviations. Second: it is possible to linearize the prob- 
lem by approximating the surface /(x, y ) with a plane; in an upcoming paper we 
will show that with a well-designed inspection setup, the plane cut through the 
normal field leads to a negligible error for visual inspection tasks. Third: a lin- 
earization of problem ({23} could be achieved by successive solving the following 
linear Neumann problems for i > 0: 

-Afi(x,y) = div q(x,y, fc-i) , 

(S7fi(x,y) | 6) = {-q{x,y,fi_i) | 6) , 

with the initial surface 

fo(x,y) = c(x,y) , 

where c(x, y) could be chosen as constant function c(x, y) = const and 6 denotes 
the outer normal to H xy , ((V/i | o) = dfi/dd) whith the scalar product ( ■ | • ) . 

From the theory of partial differential equations, it is well known that a varia- 
tional formulation for the problem ( ]2.6| > exists IIBL05L which allows weak solu- 
tions. This means, instead of directly solving ( |2.4| >, one is looking for a solution 



(•£> U ) € &xy > 

V ) ^ d£l X y 
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fi(x, V ) G H 1 (fl X y) of the equivalent variational problem flSte08l 



a{fi,v) = Q(v) , Vz/GVo, (2.7) 

with the bilinear form 

a{fi,v) = f (V/i | V;y) da; , (2.8) 

^xy 



and the linear form 

Q(y) = / div(g)z/dx — / (g|o)z/do. (2.9) 

J ri X y “ 9^xy 

Here H^fixy) denotes the Sobolev H 1 space over fi xy , Vo the space of test 
functions, and do = 6 do an element on <9£2 xy . 

For symmetric and positive bilinear forms, like equation ( |2.8| ), the variational 
problem \2.1) is equivalent to the following minimum problem: 

■Hfi) = inf J(g ) , fi G H 1 (H xy ) , 

nP T-T 1 



with the Ritz energy functional 



J{g) = 2 a (S’9 ) - 



Inserting equations ( |2.8l > and ( |2.9|) in the energy functional yields 

J (g)=l ) [ (Vff|Vg)da:-/ div(q)gdx+[ (q\d)gdo. 

^ J ^xy J ^xy dQ xy 

With integration by parts of the divergence term according to 

- div(q) g dx = / (q \ \7 g) dx - / (q \ d) g do , 

J O xv J J 



the functional becomes 

J(g)= [ l\\Vgf + (V<7 \q)dx. 

4f1 xy z 

Adding the positive constant term | ||q||“ da: leads to the final result 

J {g)=\ ( ll-Vff- q|| 2 da:. (2.10) 

^ 7^ xy 

3 This will not change the minimum of the functional the integral depends only on the 

measurement h m . 
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Figure 2.2: Adjusting the surface fi according to minimal distances to given 
points. 

Solving the variational formulation \2.1\ of the Neumann problem yields a normal 
adaption problem: find the function fi £ {g \ g £ H 1 } so that the normals to its 
graph fits best to the given measurement. This is the deeper justification for apply- 
ing the divergence operator in equation S A second justification follows from 
the Hodge decomposition of a vector field F = Vy|VxA| H, which de- 
constructs the field F in a rotational free part tp (the scalar potential), a divergence 
free part A (the vector potential) and a harmonic component H , cf. HTLHD03I . 
Applying the decomposition to the measurement dependend field q, one can inter- 
pret Equation ( |2.5) > as a normalization of the vector field q to ensure integrability, 
that is the existence of the scalar potential /. 

For solving problem \2.6\ we propose finite-element-methods (FEM) BS0IO6II due 
to the following reasons: 

1. By solving the corresponding variational problem \2.1\ , weak solutions 
can be obtained, which allows the reconstruction of surfaces that are not 
necessarily differentiable in a classical sense. 

2. FEM implies a triangulation of the region of interest f 2 xy , so that irregular 
borders and regions, where the determination of the normal-field fails, can 
be managed. 

3. FEM is a computational efficient method due to ansatz functions with local 
support. 

4. Adaptive mesh refinement methods can be employed. 

5. For further computation speed up, multigrid methods can be applied. 

6. FEM is an industrial standard for solving PDEs, so that robust software 
packages do exist. 
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Because the deflectometric reconstruction problem for a monocular setup is ill- 
conditioned in a mathematical sense, further information is required. This means 
initial and/or border values are needed. Some common regularization procedures 
are described in IIWMHB09I1 . Having additional surface points, its possible to 



combine this information with the solution of equation 2.6 in an optimal manner 



for the selection of the correct surface out of the solution manifold. The surface 
fi is a solution of a linear problem, therefore fi = fi + const will also solve the 
Neumann problem ( |2.6| . Hence, it is possible to place the surface fi so that the 
following distances d k = z k - fi(x k ,y k ) from points p k = (x k , y k , z k ) T to 
the surface fi are minimal, cf. Figure 



2.2 



\^Z d k 2 = \\pk,z ~ fi(x k ,yk ) 



(2.11) 



This leads to an optimal solution in a twofold sense: first, regarding the surface 
normals, the normals to the solution surface fit optimally to the measured normal 
field n m , cf. equation ( |2.10[ >, second regarding position dependent information, 
the surface itself has minimal distance to measured surface points p k , cf. Equa- 
tion \2. 1 1) . This approach allows the combination of surface gradient with position 
sensitive methods. 



Finally, we summarize our approach for the shape reconstruction problem in 
Algorithm |2. 1 1 



In Figure |2TT| the solution of Equation \2.1\ for partially specular objects is shown. 
Here the initial solution /o was a plane. Shown are the reconstructed surfaces f± 
after the fourth iteration according to Algorithm 2.1 Even in strong convex and 



concave regions a clear solution is obtainted, e.g., the fork rakes. 



3 Experimental Design 

In Section[2j the connection of deflectometric measurement and its induced normal 
field is derived, cf. The question is still open: which is an optimal geometric 

setup for deflectometric measurement with regard to normal field characteristics. 
Some of the characteristics are presented by simulating a realistic example, like 
the inspection of truck side mirrors. 

The example surface is the following sphere: 

S = {(x, y, z) T \z= a/200 2 -{x- 100) 2 - y 2 + 190} . 
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Algorithm 2.1 Iterative specular shape reconstruction. 

1: Given: a region of interest £2 C 1R 3 , the induced normal field h m (Q) and a 
non-empty set of points {p fc \ p k £ £2} ^ 0. 

2: Select f\ = const. II Information about the measurement setup can be used. 
3: Select termination condition e. 

4: Setup the FEM system (integrating scheme, linear solver, basis functions). 

5: repeat 

6: fi - 1 fi 

7: Generate a mesh M on £2 fl /j_ i — > (£2 xy , M). 

8: Calculate div q at the nodes of M . 

9: Calculate q at the edge nodes of M. 

10: Solve problem ( 2,6[ ) with FEM on M — > /j. 

11: Solve problem ( 2.1 1 ^ for fi and {p/,} -4 fi. 

12 : until II fi - fi- ill < e 




Figure 2.3: Real world examples (left column) and corresponding FEM solutions 
of Equation ([23} (right column). 



Image acquistion device: camera with focallength = 30 mm, pixelsize = 6 pm, 
resolution = 1000 x 1000 pixel. The camera is located at (0, 0, 150) T mm and 
looks at (150, 0, 0)mm. 
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Figure 3.1: Example of a solution manifold (a) and the corresponding curvature 



(b) all solving the problem of Equation s 



Light source: LCD with pixelsize = 0.294 mm, resolution = 1024 x 768 pixel. 



For this setup, examples of the solution manifold are shown in Figure 3.1(a) The 
family of solutions {/ A } can be obtained with several runs of Algorithm 2.1 with 
distinct regularization points p In Figure 3.1(b) the principal curvature in re- 
direction k x ^\ for these solutions / A are shown. 

We can observe: 



• All solutions are located in the sight cone 0 of the camera. Corresponding 
points at different surfaces / A are connected via camera sight rays, in other 
words, there exists a scaling. 

• Due to the LC -Display being located at a finite position, the solutions get 
rotated while traveling along a sight ray. 

• The shape, i.e., the surface curvature changes along the sight rays. Along a 
sight ray, points with equal local curvature do not exist. Thus all solutions 
have different shape. This observation is important when dealing with the 
uniqueness of the stereo based normal estimation problem. 

• At infinity, all solutions converge to an ellipsoid with infinite radius, i.e., 
if we place our object under test far enough away from the inspecting sys- 
tem, selecting the correct surface is an easy task due to the small change of 
the surface shape with distance. This can be described as regularization by 
approximation. 
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Figure 3.2: Inspection setup. Shown are three sets of parameter variation, see text. 



The dependency of the solution manifold from geometrical setup can be evaluated 
by looking at the vector gradient of the normal field in a given direction v 



V = J„(n m ) ■ v , 



with the Jacobi matrix J n of the normal field. 



We will evaluate the norm of V for three parameter sets of the geometrical 



inspection setup, cf. Figure 3.2 



In the first setup, the position of the LC-Display changes, whereas the position 
of camera and object remained fixed. The camera’s distance to the xy- plane is 
500mm. In Figure 3.3 the norm of the vector gradient in direction of the opti- 
cal camera axis | ./ n [rim] • e 2 || is shown for three positions of the LCD, thereby 
Z LGD denotes the distance of the LCD center to the a;y-plane and z c the distance 
from the optical camera center along the optical axis in camera coordinates, cf. 
Figure [iL2ta). 

With increasing distance of the LCD from the object plane, the change of the nor- 
mal field decreases. The norm of the vector gradient converges to the same value 



for all LCD positions. This normal field characteristic is shown in Figure 3.1(b) 
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Figure 3.3: Norm of vector gradient of the normal field against distance from 
optical camera center along a sight ray for different LCD positions, for the setup 
depicted in Figure |3T2{ a). 



by means of principal curvature. The larger the distance of the LCD to the object, 
the smaller the change of the normal field along a sight ray. 

In Figure |T4| the norm of the vector gradient in direction of the optical camera axis 
is shown for three distances of the whole sensor to the object, zq denotes thereby 
the sensor distance, cf. Figure [T2] b). The norm of the vector gradient does not 
differ significantly for the three sensor positions, but dereases with increase of z q. 



Finally in Figure 3.5 the norm of the vector gradient in direction of the optical 
camera axis is shown for several polar angles cf. Figure 3.2 c). With increasing 
polar angle the maximum of the normal field change moves to greater values of 
z c . If one want a maximal change in the normal field, e.g., for determining surface 
points with a stereo approach, a medium polar angle like 45° leads to good results. 
In opposite, if one needs a very small normal change, the smallest possible polar 
angel should be selected. Small changes are favored in cases like the regularization 
by approximaten approach. With a plane cut through the normal field, only litte 
errors with respect to the correct normals are made in that case. 



Following rules of thumbs for the experimental design of inspection systems for 
specular surfaces can be given: 
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Figure 3.4: Norm of vector gradient of the normal field against distance from 
optical camera center along a sight ray for different sensor positions, for the setup 
depicted in Figure |372} b). 

1. Determining surface points with a stereo setup, e.g., for the regularization 
of the shape from specular reflection problem, needs strong varying normal 
fields along search rays. The stereo method imply the determination of nor- 
mal disparities like n m — n m for the normals of the two measurements. 
For further details see llWI93llBS03llKKH041lPT04l . Large disparities can 
be achieved through: 

• Small camera to object and small LCD to object distances. 

• Angle between camera / LCD axis and mean object normal about 45°. 

2. Determining surface normals with high precision is supported through small 
changes in the normal field. This allows approximation approaches for the 
selection of the correct normals, even through plan cuts. To achieve small 
changes one can select: 

• Large system to object distances. 

• Large LCD to object distances. 

• Small angles between camera / LCD axis and mean object normal. 
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Figure 3.5: Norm of vector gradient of the normal field against distance from 
optical camera center along a sight ray for different polar angels, for the setup 
depicted in Figure |3T2] c). 

4 Conclusions and Further Work 



In this report we presented a new approach for shape reconstruction using sur- 
face normal data. It is shown, that an optimal reconstruction can be achieved due 
to the implicit employment of two minimizing problems. The normals to the re- 
constructed surface fits best to the measured normals and the surface has minimal 
distance to given regularization points. This allows the combination of surface 
gradient with position sensitive methods, especially by the inspection of partially 
specular surfaces. Furthermore, the proposed approach is insensitive to unbiased 
disturbances of normal and position data. 

Two main aspects regarding normal field sensitivity for the system setup for auto- 
mated inspection of specular surfaces can be considered: first, determining points 
on specular surfaces needs strong varying normal fields along search directions, 
and second, surface normals can be estimated with high precision in regions with 
small changes. For both inspection tasks, some rules of thumb for the experimen- 
tal design of the inspection systems are given. This insight can be achieved by the 
investigation of the changes in the normal field induced by measurement. 
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In an upcoming publication we present an extension of our proposed algorithm 
to the inspection of complex surfaces, which requires the combination of several 
measurements from distinct positions. 

The characteristics of the induced normal field will be further investigated, above 
all with regard to the global uniqueness problem of specular stereo regularization. 
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Abstract: Uncertainties in the sensor data such as measurement noise, false 
detections caused by clutter, as well as merged, split, incomplete or missed 
detections due to a sensor malfunction or occlusions (both due to the limited 
sensor field of view and objects in the scene) make multi-target tracking a 
very complicated task. Thus one of the big challenges is track management 
and correct data association between detections and tracks. In this contri- 
bution we present an algorithm for visual detection and tracking of multiple 
extended targets under occlusions and split and merge effects. Unlike most of 
the state-of-the-art approaches we utilize low-level information integrating it 
in a unified approach based on a threshold-free probabilistic conception. The 
introduced scheme makes it possible to utilize information about composition 
of the measurements gained through tracking of dedicated feature points in 
the image and resolves data association ambiguities in a soft decision using 
a globally optimal probabilistic data association approach. Beside existence 
evolution consideration we also exploit the spatial and temporal relationship 
between stably tracked points and tracked objects, which along with observ- 
ability analysis, allows us for reconstruction of compatible measurements and 
thus correct track update even in cases of splits, merges and partial occlusions 
of the tracked targets. 
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1 Introduction 

Most of the vision-based vehicle detection and tracking systems presented in the 
field of driver assistance systems in the last few decades were focusing on the 
front-looking applications and particularly on highway driving applications. Many 
of them make use of various presumptions and restrictions regarding possible ob- 
jects’ motion profiles and their lateral position (e.g. relative to the tracked lane 
markings), as well as assumptions about symmetrical appearance, shadows etc. 
In many other applications such as intersection assistance systems or side-looking 
pre-crash systems, most of those restrictions and assumptions do not apply any 
more. Many different object orientations have to be taken into account. Com- 
bined with a large variety of object types and large region of interest, this makes 
it extremely challenging to detect and track objects based on their appearance in 
real time. For the realization of such applications that are capable of a robust and 
reliable object detection, a generic approach has to be chosen. Object hypothe- 
ses are often generated from the range data that are the result of binocular stereo 
or motion stereo processing. Finding corresponding structures in two video im- 
ages and reconstructing their depth using knowledge of mutual orientation of both 
viewpoints, delivers range data, the so-called depth maps. After extracting a depth 
map, road location is estimated and points belonging to the ground plane are re- 
moved. Spatial clustering of the remaining three-dimensional points delivers point 
clouds which are used as an input in the data association step. 

Due to the noisy range estimation process, combined with the well known prob- 
lems of stereo vision systems such as difficulties of depth estimation in homoge- 
neous image regions and gross depth errors in regions with regular patterns, the 
results of clustering may vary from frame to frame leading to incomplete, split, 
merged or missing object measurements as well as to phantom objects. Further 
problem are incomplete measurements due to partial occlusions and limitations of 
the field of view (FoV) of the sensors. Visible dimensions of objects entering or 
leaving FoV or becoming occluded may change rapidly, which combined with a 
centroid-based object tracking approach leads to strongly biased object position 
and dynamics estimation. 

The basic idea of this contribution is depicted in Figure |TT| Instead of “blind” as- 
sociation between detections and tracks a novel detection-by-tracking algorithm is 
proposed which allows for a correct reconstruction of appropriate detection based 
on a feature-based probabilistic data association scheme. It utilizes information 
gained by tracking object points in the image (and in 3D space) for creation of 
point-to-track affiliation information. This information is then used for reconstruc- 
tion of appropriate measurements and allows for a correct update of the object’s 
position and dynamics in spite of partial occlusions, splits and merges. 



Feature-Based Probabilistic Data Association and Tracking 



161 





Figure 1.1: (a) Wrong data association in standard tracking-by-detection approach 
in case of a clutter-based cluster merge, (b) Basic idea of the FBPDA: utilization of 
the information about point-to-track affiliation for “reconstruction” of appropriate 
measurements for the tracking (detection-by-tracking). 
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Track initialization and termination as well as data association are done using a 
globally optimal probabilistic approach which takes into account object existence 
and observability probability and implements a track-before-detect approach. Ad- 
ditionally, we propose an observability treatment scheme utilizing a grid-based 
object representation with occupancy and occlusion modeling for each cell. This 
allows for a dedicated observability and occlusion handling. 

The proposed Feature Based Probabilistic Data Association and Tracking Algo- 
rithm (FBPDA) consists of four major steps: time forward prediction, association 
between detections and tracks, reconstruction of composite measurements from 
associated point clouds and innovation. These four steps are described in Sections 
001 u and [6] respectively. The overall framework for visual object tracking is 
introduced in Section [2] 



2 Overall system description 



The overall video-based object tracking framework used in this work is depicted 
in Figure [O] It has been built in the course of the EU funded project APROSYS 



Stereo image acquisition 



Vehicle data 




Figure 2.1: Overall framework for visual object detection and tracking 



iIaprI . The goal was detection of imminent side collisions to enable timely ac- 
tivation of novel occupant protection systems lTZM + 08l . The sensor system 
under consideration here is a side-looking stereo video camera IITWG07II . Ego- 
motion estimation and compensation is done using vehicle odometry data which 
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was shown to deliver sufficient accuracy even for the structure-from-motion task 
IWPGW07II . Detection and tracking of the feature points is realized similar to 
the system proposed in IIFRBG05II . Up to 3000 feature points are tracked simul- 
taneously in 3-D space using Kalman Filters. Their six-dimensional state vectors 
[x, y, z, v x ,v y , v z ] T are estimated from the stereo depth measurements as well as 
from their displacement in the image between consecutive frames (optic flow). The 
measurement vector is thus [u,v,d] T with image coordinates (u, v) and feature 
depth d. 

After elimination of the ground points, the remaining points are clustered to the 
point clouds that give measurements for object tracking. Object parameters are 
estimated using an Extended Kalman Filter. Internally, objects are modeled as 
cuboids with a centroid (x, y, z ), dimensions (Z, w 1 h ), geometrical orientation <f>, 
motion orientation <p, speed v, acceleration a and yaw rate p. This corresponds to 
the Constant Yaw Rate Model with Acceleration, which has proved to deliver the 
best performance in the case of a side-looking system I1RGW08II . Due to the fact 
that often only a part of an object is visible to the sensors, orientation of the ob- 
ject’s motion may differ from the estimated geometrical orientation. Object’s ge- 
ometric orientation and dimensions are updated taking into account the occlusion 
information as described in Section 0 



For performing the association between the point clouds and the tracks and for 
the track state propagation we propose to use the Feature-Based Probabilistic Data 
Association and Tracking Algorithm (FBPDA) which is described in the following 
sections. For effective noise handling as well as for handling of split and merged 
point clouds FBPDA provides affiliation probabilities of each point to the cur- 
rent tracks. These affiliation probabilities are exploited in the course of the data 
association as well as for updating tracks’ position, dynamics and dimensions. 



For handling of occlusions we explicitly model and propagate targets’ observabil- 
ity probability thus decoupling existence and visibility of the targets. Contrary 
to other approaches ( IIMES94I . IMSL + 08|D . we do not bundle two of the three 
possible states ((3): “object existent and observable”, (3): “object existent but 
not observable” and (P): “object non-existent” (“phantom track”)) but model the 
object observability by means of a separate Markov chain with the states “object 
observable” and “object not observable” as depicted in Fig. 2.2 d). This leads 
to object state propagation scheme with three cross-coupled Markov chains as de- 
picted in Fig. 2.3 One of the Markov chains is responsible for object existence 



propagation and implements an JIPDA scheme I1ME02I . the second one models 
dynamic object state propagation which is done using an Extended Kalman-Filter, 
and the third one is used for object observability modeling. 
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Figure 2.2: Markov chain based modeling of object existence and observability 
(a). The three possible track states Z>, 7$ and P (object existent and observable, 
object existent but not observable and object non-existent (phantom track)) can 
be bundled in different ways. One possibility is to bundle the two first states to 
the umbrella states “object existent” and “object non-existent” (3 and $) as done 
in H M E S 9 4 H (b). Another possibility is to model occluded objects as non-existent 
as done in llMSL + 08t (c). This leads to existence-based track termination of oc- 
cluded targets. FBPDA decouples existence and observability modeling object 
observability as a separate Markov chain (d). 



Target Existence 



Pfc-iifc-i( 3 H 



p fc-ii*:-i 
















Figure 2.3: Three cross-coupled Markov chains used in the FBPDA 
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3 Time forward prediction 



At each time step, after having performed the ego-motion compensation, we start 
by predicting the new track attributes. This includes prediction of the track’s 
dynamic state as well as prediction of it’s existence and observability probabili- 
ties. The FBPDA state prediction of a track is identical to the common Extended 
Kalman Filter prediction. The time forward prediction equations for the existence 
and non-existence probabilities p x ^ k _ 1 ( 3) and p x ^ k _ 1 {$) of a track x are: 

Pk\k-i( 3) = P X ( 3 3) ■ +P X 0 -» 3) -Pk-i\k-i$) 

Pk\k-i(fi) = 1 — Ffc l fe- 1 (3) 

with p^_ 1 i fc _ 1 (3) being the a-posteriori existence probability from the last frame 
andp x (3 — > 3) andp x ($ — ► 3) denoting the persistence and the birth probabilities 
of a track x. The last two factors are used for modeling the spatial distribution of 
the target birth and death probabilities. This makes it possible to account e.g. 
for the fact that at the borders of the field of view and at far distances the birth 
probability is higher than right in front of the sensors. 



For the computation of the observability probability and for the reconstruction of 
occluded measurements we use a grid-based 3-D representation of the targets. For 
each track Xj, we define a 3-D grid with the origin at its centroid. The orientation 
of the grid is aligned with the track’s orientation (cf. Fig. 3.1 (a)). Using this rep- 
resentation of predicted objects it is possible to calculate their appearance masks 
in the camera image M x ' ( u , v) 6 {0, 1} by projecting the occupied grid cells into 
the image. The Appearance Probability Mask M x ^(u, v ) for each track is given 
by 



M p(a)( u > v ) =p(A Xi (u,v)) = M%(u, v) •i ? fcffc_ 1 (3) 

where A Xi ( u , v ) is the event of the track x,; appearing at the image position ( u , v ) 
and p x |‘ fc _ 1 (3) is the predicted existence probability of the track x,. By overlaying 
the appearance probability masks of all the objects lying in front of the object x; 
we get the occlusion probability map for the respective object in the new frame 
(cf. Fig. 3.1 (b)). The occlusion probability p Xi (jb,u, v) at each pixel (u, v) is 



calculated as 



p Xi (3>,u,v) = p( 1J A Xr (u,v)) 

x r eX' a 

with Xq being set of the objects lying at the pixel position (u,v) in front of the 
object x,;. After the occlusion probability map for an object is built, we can esti- 
mate the occlusion probability p Xi (^, c) of the object’s grid cells. This is done by 
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Verdeckungskarte 





p x (75,c) = 0.0 p x (75,c) = 0.8 



(a) 



(b) 



Figure 3.1: Grid-based object representation (a) and occlusion probability estima- 
tion for grid cells using occlusion map (b) 

projecting the cell centers into the image and grabbing the corresponding value of 
the occlusion probability map as shown in the Fig. | 3 ,l| (b). 

Based on these probabilities we can calculate the observability transition probabil- 
ities p Ki (d— o), p Xi (D— >7)), p Xi (7)— o) and p Xi (7$— >^>) of a track and predict 
its observability probability and occlusion probability in the new frame: 



4 Data association between detections and tracks 



Given multiple active tracks and multiple detections, there are often several as- 
signment possibilities being more or less probable. Unlike other methods such as 
GNN, FBPDA does not choose one of these hypotheses for the innovation of a 
track, but considers all assignment possibilities in a soft decision. For this aim we 
define the set X={X , ,xb,©} as the aggregation X , ={xi, X2, x„} of the n 
current tracks plus two special elements xg representing a so far not known object 
and © representing a clutter source and the set Z={Z' , z 0 , z^, z^} as the aggrega- 
tion Z'={zi, Z2, ..., z m } of the m current point clouds plus three special elements 
z 0 , z^j and z^. The element z 0 stands for erroneously missed detection caused 
by sensor failure, z^ represents the correct absence of a detection because of oc- 
clusion of the target and zj the correct absence of a detection because of the target 
non-existence (death). The association between point clouds and tracks is modeled 
as a bipartite graph with edges e : (x G X r-y z € Z) between elements of X and 




Copyrighted material 



Feature-Based Probabilistic Data Association and Tracking 



167 



elements of Z. Assignments between two special elements are prohibited. A valid 
assignment hypothesis can thus contain six types of edges e: 



• Xj h4 z j\ assumption that point cloud z j has been caused by the track x; 

• xg 1 — V z j : point cloud z j has been caused by a so far not known object 

• © h 4 z j: assumption that point cloud z j has been caused by clutter 

• x; i-> z^j: track x, did not cause a point cloud because it is not observable 

• Xj 1 — > z^: track x,; did not cause a point cloud because it does not exist 

• Xj z 0 : track Xj did not cause a point cloud because of a sensing error 

Fig. 4.1 illustrates four valid assignment hypotheses for the case of n = 2 tracks 
and m = 2 point clouds. 



X 


z 


Xl — 


— ► Z 1 


x 2 — 


► z 2 


© 


Z$ 


x« 


z 0 


Assignment 


z 2 


Hypothesis 1 



X z 

Xi ►Zi 

x 2\*' z 2 

x /y z 0 

Z 7S 

Assignment 
Hypothesis 2 



x z 




Hypothesis 3 



x z 




z 2 

Assignment 
Hypothesis 4 



Figure 4.1: Examples of valid assignment hypotheses 
The probabilities of the six edge types are calculated in analogy to llMSL + 08ll : 

P(e =( x * ^ Zj)) =i J fe|jfc-i( 3 )-Pfc|fe- 1 (3)-p Zi (TP|x i )- (1 -p*i(FP)) 

p(e =(xj Z2J )) = • (1 - Pfef fe _i(=>)) 

p(e=(xj z 0 )) =Pfei fc _ 1 (3)-Ffc| /: _ 1 (D)-p Xi (PA') 

p(e =(xj z^)) = (1 ~ Pk\ k -i( 3 )) ' Pk\k-i( D ) ' i 1 -P Xi ( FN )) 

p{e =(x B i-s- zj)) = (1 - y]p^(TP|xj)).(l-p^(FP)) 



p(e =(© M- Zj -)) = (1 - E ^ ( TP I X ‘)) ' P Zj ' m 



(4.1) 
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with p Zj (FP) being the probability that a clutter-caused point cloud appears at 
the position of z j, p x * (FN) the false negative probability for the track and 
p Zj (TP|x;) being the likelihood function obtained from the Kalman filter. 

With edge probabilities in <ED it is now possible to calculate the probability of an 
assignment hypothesis E = {ei, e g } by: 

p ( e ) = n P ( e °) 

0=1 

The association probabilities are calculated as the sum of all assignment hy- 
pothesis probabilities including the edge e = (x; H z, ) divided by the sum of the 
probabilities of all assignment hypotheses assuming track x,; as existent: 

oXj _ ^{g|e=(x,^z J -)E£} P( E ) 

Pzj E { e,c=(*.-z t )mPW ' 

/3 x o ; is the probability of the event that no point cloud originated from track x;. It 
is calculated analogously to ( |4.2| i. 

5 Handling of split and merge effects and recon- 
struction of compatible object measurements 

For the handling of the possibility of split and merge events in the course of data as- 
sociation, one option is to allow the assignment of a set of point clouds to one track 
or a set of tracks to one point cloud, respectively HKRSW06II . This would make 
it possible to identify splits and merges but would not allow to make an appropri- 
ate update since there is a one-to-one association necessary for doing so. Another 
possibility would be to create virtual measurements from the original measure- 
ments by splitting and merging them using predicted states of the tracked objects 
as proposed in IIGOM04II . This approach would maintain one-to-one matchings 
between objects and associated point clouds and would allow to update the track’s 
state after creation of compatible state measurement. The disadvantages of the 
method are ambiguous partitioning of original measurements in case of merging 
targets and exploding number of feasible associations. Furthermore, when using a 
centroid-based tracking approach, in the case of split and merged targets the result- 
ing measurement’s center of gravity (CoG) might not lie inside the gating ellipse 
of the targets’ tracks. Another problematic case is the position update of a target 
which is partially occluded (due to either other objects in the scene or restricted 
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field of view of the sensors). If target position is updated based on the CoG of 
the point cloud, it will cause a shift of the track’s centroid and induce an impulse 
which will introduce bias to the estimated track’s state. 

To avoid such problems we reconstruct for each detection-to-track association the 
track’s centroid and orientation using stably tracked points. We hereby utilize 
information about the affiliation of the tracked points to the tracks independently 
of the currently occurred splits and merges. The reconstructed centroids can then 
be used instead of the CoGs of the corresponding point clouds for the computation 
of the state measurement for each track. 



5.1 Determination of the point-to-track affiliation probabilities 

The affiliation probability p(x^ i— > p g ) of a point p ? to a tracked object x; is 
determined based on the association probability j3 of the point cloud z containing 
p 9 to that object. For the realization of a memory effect we filter the affiliation 
probabilities using a gain constant g E [0, 1]: 

Pk (x* p g ) = g-P? + (1 - g) -Pfe-ifc H4 p g ). 



5.2 Point cloud based reconstruction of the track’s position and 
orientation 



For the reconstruction of the track centroid p o from an associated point cloud 
we first calculate the CoG pcoG of the stably tracked points of this point cloud 
both in the current and previous frame. Hereby we are weighting the points’ 3-D 
positions with their track affiliation probability (known from the previous frame). 
Having computed pcoG in both current and previous frames, we can for each 
tracked point p q reconstruct the vector (pcoGPo)q pointing from the pcoG to 
the object centroid po in the current frame using knowledge about the relative 
orientation of this vector regarding the vector pcoGPq in the previous frame (cf. 
Fig. 5.1 . Building a weighted sum of the resulting vectors (pcoGPo)g according 



to the affiliation probability of the respective points p q we get the new position of 
the track’s centroid with respect to the considered point cloud. The track’s new 
orientation can be obtained in the same way. Together those parameters form the 
reconstructed measurement of the point cloud. All these measurements weighted 
according to the association probabilities of the corresponding point clouds build a 
composed measurement which is then used for the innovation of the Kalman Filter 
responsible for the track’s dynamics. 
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Figure 5.1: Object position reconstruction based on spatial relationship to stably 
tracked points 



5.3 Grid-based reconstruction of the track’s extent 

The extent of a track is obtained in a similar way. We create one composite mea- 
surement from all point clouds associated to the track. This is done using the 
grid representation of the track which is aligned according to its new position and 
orientation gained through the innovation of the track’s dynamics. The points of 
each associated point cloud are sorted into the grid. Thereby points of a point 
cloud contribute to the occupancy value of the grid cell according to their affilia- 
tion probability to the track and the association probability of the point cloud. For 
each cell c its current occupancy value o at time step k with respect to the track Xj 
is computed according to 



o 



m l 

k ( c ) = ■ J2 p( xi ^ p« 

7=1 9=1 



with to being current number of point clouds and l being the number of points p q 
belonging to the point cloud Zj and falling into the cell c. The grid is updated using 
this occupancy values. To avoid an update of the occupancy value of an occluded 
cell with 0, we filter the occupancy values of the cells using their occlusion prob- 



ability. This process is visualized in Fig. 5.2 For each cell c its filtered occupancy 
value bfc (c) at time step k is given by 



of (c) = pf (=>, c) • of (c) + pf (75, c) • of_, ( c ) 
with p Xi (d, c) = 1 — p Xi (75, c). 

Object’s geometric orientation and dimensions are obtained using a RANSAC 
based estimation of the main visible object surface in the top view and fitting 
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Figure 5.2: Process of grid-based measurement composition for object extent 

a rectangle with this orientation into the ground projection of the occupied grid 
cells. Together, these parameters form the resulting composite measurement 
which is then used for the innovation of the track’s geometric orientation and its 
dimensions. 



6 State, existence and observability innovation 



In the innovation step, all three Markov chains are updated. The innovation of 
the target’s physical attributes corresponds to the standard Extended Kalman Filter 
innovation. For maintaining the centroid position obtained from the grid based 
object extent computation as the reference point to be tracked, we switch to this 
point as a new reference point at the end of each frame. This prevents the object 
dimensions from jittering due to the one-sided changes of the visible object extent 
(e.g. in the case of a target entering the field of view). The a-posteriori probability 
of the track existence is calculated as the sum of the probabilities of all assignment 
hypotheses assuming track as existent divided by the sum of the probabilities 
of all possible hypotheses: 



= 



E 



{g|e=(x,-i->zj)^g} 

E{£} p(E) 



p(E) 



The observability update is done analogously to the existence innovation. 
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7 Experimental results 

The algorithm has been validated with both simulated and real data. For our sim- 
ulations we used a tool which generated point clouds with pre-defined parameters 
and behavior. We modeled several scenarios that caused problems for the standard 
approach such as splitting and merging point clouds, objects entering and leav- 
ing the FoV and occlusion scenarios. Compared to the standard association and 
tracking scheme which lead to considerable corruption of the position and velocity 
estimation and even to the termination and re-initialization of the tracks, FBPDA 
managed to correctly update tracks’ parameters through multiple merges, splits 
and occlusions. 



8 Conclusion 

In this contribution we have presented an algorithm for visual detection and track- 
ing of multiple extended targets which is capable of coping with noisy, split, 
merged, incomplete and missed detections. The proposed approach resolves data 
association ambiguities in a soft decision based not only on target state predic- 
tion but also on the existence and observability estimation modeled as two addi- 
tional Markov Chains. For the correct estimation of the desired object parameters, 
low-level information about measurement composition is utilized which is gained 
through tracking dedicated feature points in the image and 3D space. Along with 
the occlusion analysis, spatial and temporal relationship between the set of stably 
tracked points and the object’s centroid is exploited which allows for the recon- 
struction of the desired object characteristics from the data even in case of detec- 
tion errors due to limited FoV, occlusions, noise and sensor malfunction. For track- 
ing applications that have to cope with the ebove-mentioned efects, our algorithm 
offers a much-needed enhancement which has the potential to greatly increase 
detetction and tracking performance and overall system robustness. 
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Abstract: The current state-of-the-art in pattern and character classification 
still reveals many unsolved problems, e.g., a robust classifier with respect 
to noise or other distortions in the character images is one of them. It is 
desirable that a classifier is easy to train and on the other hand very robust 
to any possible errors in the character images. Furthermore, the classification 
procedure should be real-time capable. In this technical report we introduce 
a new classifier that is based on trellis diagrams. It basically works like a 
Viterbi decoder known from communication systems. The fundamentals of 
the training and classification procedure are discussed in detail. In addition, 
we show the performance of the classifier on data with and without additive 
noise of different levels. 



1 Introduction 



In this report we introduce a new classifier, which is based on trellis diagrams. The 
performance of this classifier is demonstrated on an example in optical character 
recognition (OCR), which has a quite long history. The first patent on OCR was 
filed in the year 1929 in Germany by Tausheck, whose system is based on optical 
and mechanical template matching. In the United States, Handel applied for the 
first patent on OCR in 1933. However, OCR was not satisfactorily applicable until 
the 1950s when the first digital computers and improved scanning devices were 
introduced. In 1955 the first commercial OCR system was installed at Reader 
Digest, and since 1965 the US Postal Service has been using OCR for sorting mail. 
Nowadays, since memory and computational power has become cheap, OCR is 
applicable in several fields, like readers for the blind, automatic data entry of bank 
account information, and process automation. 
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In recent years many different systems have been developed. Most of them are 
based on statistical methods, artificial neural networks, support vector machines, 
or ensemble methods that cascade several weak learners to one powerful classi- 
fier. These methods are well known and have been investigated in detail. How- 
ever, there are still a lot of not adequately solved problems, e.g., if there are any 
changes in font, font size, or distortions in the images the performance of the clas- 
sifiers significantly decreases. A detailed documentation about the advances and 
the remaining problems of common classifiers can be found in HLF08I . For more 
information about OCR the reader is referred to llMSY921IGS90llMan86l . 

1.1 Contributions 

The application of the classifier we introduce in this report is on reading single 
printed numerals on different kind of materials, sometimes under severe environ- 
mental conditions. Furthermore, the numerals are variable in size and font, which 
requires a classifier with a good generalization ability with respect to different 
fonts and font sizes. Since the classification task changes depending on the appli- 
cation we need a classifier that can be easily augmented with additional classes or 
by new training data. The different materials the characters are printed on can ad- 
ditionally cause errors in the character images, which can be interpreted as noise. 
Hence, the classifier has to be robust to noise occurring in the character images as 
well. 

The classifier we introduce in this report is based on trellis diagrams, where each 
diagram represents one model corresponding to one class. The models consist of 
states and weights, which are obtained in the training procedure. Basically, one 
model is similar to a Markov model except that the number of states be variable. 
The number changes with respect to the pixel position in the character image. 
Furthermore, the transitions from state to state are weighted, which is similar to 
the transition probabilities known from Markov models. 

For classification the “shortest” path through the trellis diagrams is determined 
with respect to a given test vector. The winning class is finally the model with the 
“shortest” path. Basically, the classification procedure is similar to signal detection 
of discrete time signals as known from many communication systems. For this 
reason we use the Viterbi algorithm for the evaluation of the trellis diagrams, i.e., 
for classification. 

In this report we point out the training procedure, which allows an easy augmen- 
tation of the classifier by new training data or even new classes. Furthermore, we 
want to show how the performance of the classifier is influenced by noise in the 
character images. 
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Figure 2.1: Block diagram of the entire classifier that discriminates between N c 
classes. 



1.2 Structure 

The technical report is organized as follows. The training of the classifier is dis- 
cussed in Section[2 Section[3]is devoted to the fundamentals of the classification 
procedure. In Section [4] we demonstrate the performance of the classifier. Finally, 
we give a conclusion and some remarks for future work in Section [5] 



2 Training of the Classifier 

The classifier we introduce is based on one model for each class, where each model 
is built according to the available training data. Thus, the entire classifier consists 
of N c models for classification of N c classes. The structure is shown in Fig- 
ure ELD To keep the derivation of the training and classification procedure easy 
we only consider a one class classifier. Thus, the superscripts indicating the class 
are neglected in the following. Before the training procedure of the trellis-based 
classifier is introduced we want to start with some assumptions. 

The character image G £ B RIxN with M columns, N rows, and gray values in 
B = {0,1,..., 255} C INo is represented by the column vector g £ B* of dimen- 
sion K = M ■ N. This allows the interpretation of an image as signal sequence 
similar to a received discrete signal transmitted over a communication channel. 
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Figure 2.2: Basic idea of a part of a model representing one class. 



During the training we determine a model for each class, which consists of vertexes 
Sj(k) representing gray values that are called states in the following, and edges 
assigned with weights Wij(k). The weights can be interpreted as the cost of the 
transition from state s;(fc — 1) to state Sj(k), where k denotes the pixel position of 
the image in the vector g. This is analog to the transition probabilities in hidden 
Markov models. The structure of one model is illustrated in Figure 2.2 where 
the transitions from state to state are indicated by arrows. The big arrows are for 
simplification. They are indicating the transitions like the arrows on the left hand 
side. Formally, the states in Figure [T2]can be represented by matrix 



S := 



[ gi g2 ■ ■ ■ gP ] 



T 



so(0) 



sp(0) 



so(K-l) 
sp(K - 1) 



in which the P given training samples of one class are arranged line-by-line. Ma- 
trix S contains the different states Sj(k) line-by-line whereas the columns denote 
the pixel position k. In most cases the same gray value at position k appears 
multiple times. But those have to be evaluated only once since they yield in classi- 
fication to the same result. This is the reason why they can be merged to one state 
with the corresponding gray value. Hence, every possible state is contained only 
once in a column. This enables the alignment of the obtained valid states at the 
beginning of each column k. Now, in every column are equal to or less than P el- 
ements. To keep the matrix structure the remaining elements of matrix S are filled 
with —1. Thus, the corresponding transition weights can be neglected as well, i.e., 
they are set to infinity. For the model, the remaining weights Wij(k) € (0, 1] are 
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determined according to 




k — 1, . . . , K — 1 



( 2 . 1 ) 



where |e^ (fc)| denotes the number of transitions from state Si(k — 1) to sj(k) for 
all training samples of one class. This favors transitions that occur more frequently 
in the training data, i.e., those paths are more likely. 

This procedure has to be performed N c times since we need N c models for the 
discrimination of N c classes. The computational complexity of the training pro- 
cedure is kept within a reasonable limit since the procedure is quite simple. This 
results in further advantages. One of them is the easy augmentation of the classi- 
fier by a new class, since just a model for that class has to be added to the already 
existing classifier models. Another one is that the existing classifier models can 
be easily augmented by new training data without a re-considering of the old data. 
For new training data, only the non-existing states and the corresponding weights 
have to be added to the model. If the state already exists only the corresponding 
weights have to be adapted. 

3 Classification 

For classification the evaluation of all models is necessary. This is done by deter- 
mining the path of minimal cost — shortest path — for every model of the different 
classes. According to the principle of optimality llBer05l we can apply dynamic 
programming since the path can be recursively determined. Dynamic program- 
ming reduces the computational complexity of the determination of the shortest 
path through a trellis diagram. For this reason and the fact that this classifier is 
similar to signal detection in communication systems the Viterbi algorithm is used. 

3.1 Viterbi Algorithm 

Andrew Viterbi HVit67ll introduced the Viterbi algorithm in 1967. This algorithm 
is a so-called forward dynamic programming algorithm, which can be used for the 
determination of the shortest path through a trellis diagram. The advantage of the 
Viterbi algorithm is that it reduces the computational effort of the evaluation of 
one trellis from a product 
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to a sum 




E B(k-l)B(k)\ , 



where K is the length of the image vector g. Here, the number of states in the 
trellis diagram at position k are denoted by B{k). 

The determination of the shortest path through the trellis diagram starts at image 
position k = 0 and recursively proceeds to the last k = K — 1. For all positions 
k, the squared difference is calculated between <% and all given states Sj(k) of the 
trellis diagram, g^ indicates the element of vector g — representing an image to be 
classified — at position k. Formally, this can be expressed as 



In the next step the shortest distance leading to state Sj(k ) at position k is 
calculated. At position k = 0 it is just equal to the squared difference 



The metric is the minimum of the sum of the shortest distance in state Si(k — 1) 
and the weighted squared difference A j{k) over all states i at position k — 1. If the 
last pixel k = K — 1 of the image is reached, the algorithm ends and the shortest 
path can be determined by 



The algorithm that has been described so far is related to the evaluation of one 
model. Hence, this has to be repeated N c times if the classifier is trained with N c 
models to discriminate between N c classes. In the next section, the entire classifier 
that consists of N c models is described in detail. Further information concerning 
the Viterbi algorithm can be found in llVit67llFor73llBer05llMS00ll . 



Xj(k) = (9k ~ Sj(k)f , k — 0, . . . ,K - 1 . 



Aj(0) = A,(0) . 



If k > 0 then the metric is given by 



A j(k) = min{Ai(fc — 1) + Wij(k)Xj(k)} 

i 

k — 1, . . . , K — 1 . 



A m i„(A' - 1) = min Aj (K - 1) 
j 



(3.1) 



3.2 The Classifier 



In this section we discuss the entire classifier that consists of N c models. The 
models are necessary for the discrimination of N c classes. The evaluation of each 
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model is done by the Viterbi algorithm as described in the previous section. Hence, 
the computational effort of the entire classifier increases N c times to 



/ K—l 

O [N C J2 B ( k ~ l ) B ( k ) 

\ k = 1 

According to Equation <EU the shortest path of one model is determined, which 
is the basis of the decision. The winning class Cj is finally given by the minimal 
cost over all N c shortest paths 




u> = argminA %} n (K - 1) . 

OJ 

For simplification of the derivation superscript w — indicating the class — has been 
neglected in Section [2] but now it is reintroduced since the classification task has 
more than one class. In Figure 0 the block diagram of the entire classifier is 
given. 



4 Experiments 

In this section the experimental results of the trellis-based classifier are discussed. 
For that purpose a character dataset was created and randomly split into a training 
and a test set. Additionally, the test set was affected by different noise levels to 
show the robustness of this classifier. 



4.1 Character Image Dataset 



For the character database, numbers from zero to nine were printed in several fonts 
and different sizes, see Table |4.1| The variation of the appearance is important 



as the classification task introduced in Section 0 requires a certain robustness 
against such changes. For digitalization, the printed pages were captured with 
an industrial camera. Additionally, the single characters were separated such that 
each image contains only one character. Finally, to get a consistent size of all 
characters they were scaled to [24 x 24] pixel images. Due to the fact that two 
disjoint datasets are needed for training and classification the character dataset was 
randomly divided. The training set contains 5014 and the test set 1256 samples, 
where all classes are almost equally distributed in both sets. For equalization in 
illumination and contrast of the single character images, they were normalized 
with respect to mean and variance of the gray values in each image. 
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Fonts 


Arial, Bookman-Old-Style, Book-Antiqua, 
Century, Courier-New, Garamond, Lucida- 
Console, Lucida-Sans-Unicode, MS-Serif, 
Tahoma, Times-New-Roman, Verdana 


Types 


normal, bold, italic, italic-bold 


Sizes 


10, 12, 14, 16, 18 



Table 4.1: All fonts contained in the character dataset. 



4.2 Experimental Results 



The training with the introduced training set results in a classifier with ten models, 
which has 85 states per pixel on average. Additionally, 5 ■ 10 6 transition weights 
are approximately determined during training for every model on average. 



Firstly, the classifier is tested on the training set on which it classifies all characters 
correctly. Next, we determine the error rate on the test set containing 1256 sam- 
ples. Again, all characters are classified correctly on this set, which indicates that 
no overfitting has been occurred. Figure 4.1 shows a so-called boxplot, where the 
minimal values A ^} n {K — 1) of all images containing number seven are plotted 




Figure 4.1: Boxplot of A ^} n (K — 1) for all images of class 7 contained in the test 
dataset. 
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6 — > 8 6 — > 8 8 — >0 2 —> 1 7 — > 1 2^1 



Figure 4.2: All misclassified numbers of the test set affected by zero-mean white 
Gaussian noise with a = 25.5. 



against all classes. In the plot one can see that the values A^,- n (K — 1) of model 
seven are significantly — about two orders of magnitude — lower than the values 
A min(K ~ 1) °f other models. Due to the fact that number one and seven are 
similar, the values A^\ n (K — 1) of model one show the second lowest values. In 
spite of many outliers for model seven the discrimination of all sevens is correct. 



In the case of errors in the character images the values A^] n (K — 1) of all models 
are increased and move closer to each other, i.e., the values A y min (K — 1) are not 
two orders of magnitude smaller anymore. For example, if noi se is added to the 

are increased. 



4.1 



character images of all sevens, the values A^J rl (K — 1) in Figure 
Especially, the values A y min (K — 1) are dramatically increased, which makes the 
discrimination of the classes more challenging. 



To show the performance of the trellis-based classifier, we perform another exper- 
iment. For this experiment, the test set is affected by additive zero-mean Gaussian 
noise with standard deviation a = 25.5. This causes a slight increase of the error 
rate to 0.48%. This corresponds to six misclassified characters that are shown in 
Figure |4,2| As one can see, most of the misclassified numbers are similar to the 
numbers assigned by the classifier, e.g., six looks similar to eight, or eight looks 
similar to zero, which is additionally amplified by the noise. The numbers under- 
neath the characters (on the left hand side of the arrows) indicate the actual class 
and (on the right hand side) the class assigned by the classifier. Furthermore, the 
figure shows the noisy images in which we can see how the noise influences the 
characters. 



For the next experiment we increase the standard deviation to a = 44.2. In this 
experiment the trellis-based classifier shows an error rate of 2.79%, which is ac- 
ceptable considering the errors due to the additive noise. All misclassified charac- 
ters can be found in Figure 4.3 Analog to the result of the previous experiment, 
most of the misclassified numbers look similar to the assigned numbers. It is con- 
spicuous in Figure [43] that many twos are classified as ones, in spite of the two 
characters do not have significant similarities. 
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Figure 4.3: All misclassified numbers of the test set affected by zero-mean white 
Gaussian noise with a = 44.2. 



4.3 Discussion 



Finally, we can summarize that this new kind of classifier is very easy to train. 
Hence, the classifier models can be easily augmented by new training data or new 
models can be added if the new training data contains new classes. The results of 
the experiment show that the performance even on noise affected character images 
decreases as expected, but it is still appropriate. One main drawback is that a lot of 
weights have to be stored and processed during classification, which mainly influ- 
ences classification speed. Furthermore, the weights are determined according to 
equation ( |2.1| ), which is not an appropriate approach. It will probably be a better 
way to determine the weights adaptively during training, i.e., similar to the deter- 
mination of the weights in neural networks. We assume that this will improve the 
models of the classifier, since they will probably be better adapted to the training 
data of course under the consideration that overfitting is avoided. Hence, we can 
assume that the performance of classification will be improved, too. 
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5 Conclusion 

We have shown the fundamentals of the classifier that is based on the evaluation 
of trellis diagrams using the Viterbi algorithm. The training procedure has been 
discussed in detail and the simplicity of training has been pointed out. This implies 
the simple augmentation of the classifier by new training data or even a new class. 

Finally, the performance of the classifier has been demonstrated on the example of 
character recognition. Furthermore, we have shown the performance on a character 
dataset affected by noise of different levels. As expected the error rate increases 
with a higher standard deviation of the noise, but it still remains low. 

In future we will work on an adaptive determination of the transition weights 
during the training procedure. Additionally, we will investigate whether a reduc- 
tion of the number of states is possible without a significant loss of classification 
performance. This will lower the computational complexity and thus, speed up 
classification. 
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Abstract: Automated cooperative maneuvers could help to avoid or miti- 
gate accidents in road traffic. This report presents a tree search approach 
for cooperative motion planning. Different branch and bound methods re- 
lying on precomputation and pruning are explored. Simulation results show 
that successful collision avoidance in intersection scenarios is achievable with 
acceptable computation times. 



1 Introduction 

Advances in communication technology enable wireless vehicle-to-vehicle com- 
munication. This capability may be used to increase road safety. Until now, re- 
search has focused on warning systems I1MS06I and on vehicle platoons llVar93l . 
However, cooperative maneuvers offer a large potential for collision avoidance 
and mitigation that has not yet been exploited. By negotiating a cooperative mo- 
tion plan using wireless communication, vehicles might automatically intervene 
in dangerous situations and prevent accidents more effective than a driver could 
do on his own. Scenarios where the gain of cooperative actions becomes evident 
include overtaking, obstacle avoidance and intersection situations I1FBB08II . This 
report describes a method to plan a cooperative maneuver for the vehicles within 
a cooperative group I1FBWB08I . 

Previous approaches to motion planning for multiple vehicles or robots usually 
make certain assumptions which simplify the problem. Common simplifications 
are motion planning in a fixed order of priority IIELP87I |vdBQ05l and decoupling 
of path planning and velocity planning IIKZ86II . These assumptions narrow the 
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space of possible solutions, especially in the present context of vehicles with sig- 
nificant dynamics and high velocities IIFBB09II . Therefore, cooperative motion 
planning without those assumptions is pursued in this report. While it is rather 
straightforward to formulate the planning problem in the composite configuration 
space HLaV06l . this approach has rarely been applied in practice due to its compu- 
tational complexity. In this report, methods to alleviate the complexity are inves- 
tigated. Especially, it is shown that precomputation of certain information enables 
a more efficient search. 

The paper is organized as follows. Section[2]formulates the problem of cooperative 
motion planning. In Section [3] a loss functional and models of the vehicles and 
their actions are presented. Section[4]proposes different algorithms to solve the co- 
operative planning problem using branch and bound search. Results are discussed 
in Section|5] Finally, conclusions are presented in Section[6] 



2 Problem Formulation 

The following problem is considered in this report: given the initial positions Xo 
and velocities vo of the m vehicles within a cooperative group and a time interval 
[0, f max] s plan a cooperative maneuver x(f) that optimizes certain criteria encom- 
passing collision avoidance. In contrast to most work on motion planning, the goal 
positions are not predefined. As this method is only applied to circumvent danger- 
ous traffic situations, they can be optimized by the planning algorithm in order to 
achieve the best collision-free cooperative motion. 

Positions, states, capabilites, and geometric properties of the vehicles are assumed 
to be known from the common relevant picture of the cooperative group I1FBB08I . 
The resulting cooperative maneuver is transmitted to all members of the group. 
Each vehicle performs a detailed planning within the tolerances determined by the 
cooperative plan and executes its part of the maneuver. These steps are outside the 
scope of this report. 



3 Modeling 

The planning algorithm described in this report has the advantage of explicitly 
specifying the vehicle model, the vehicle actions, and the loss functional. This 
allows to trace back effects observed in the resulting plans to the model assump- 
tions, and hence a systematic design of the models. The models currently used are 
presented in the remainder of this section. 
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3.1 Vehicle Model 

The configuration of the i th vehicle is denoted by x, = (xi, yi, fa) T , where 
(xj,t/j) T is its position in the road plane and fa is its orientation. The linear 
velocity Vi is regarded as an additional state variable. The control variables are 
longitudinal acceleration acc; and steering angle ai. 

This yields the following vehicle model: 



[xi\ 




/ Vi cos fa\ 




/ 0 \ 


Vi 


= 


Vi sin fa 

0 


+ 


0 

„. tana,; 
Vl h 


w 




l o ) 




\ acci / 



where li is the wheelbase of the vehicle. 

The abstraction from slip angles and tire friction coefficients seems to be admis- 
sible as such effects can be considered during the detailed planning performed 
subsequently by each vehicle. However, the presented algorithm is not restricted 
to this particular model. The model can be replaced by a more accurate one if 
necessary. 

3.2 Actions 

To allow a finite search, both time and the continuous control space are discretized. 
Time is divided in equidistant intervals of duration At. The planning horizon 
consists of T such intervals: f max = T ■ At. Decisions are made at the points in 
time t — 0, At, . . . , (T — 1) Af and the corresponding actions are executed for the 
next time interval. 

An action of the i th vehicle is denoted by ai, the set of all considered actions of the 
vehicle by Ai. A cooperative action of m vehicles is a vector a = (a i, . . . , a m ) T G 
A with ai € Ai. For simplicity of the analysis, we sometimes assume that all 
vehicles have identical action sets Ao- In this case, A = A™. 

The action ai can be expressed in terms of the control inputs acci, ai. Its execu- 
tion is simulated by numerically integrating the vehicle model § 3 for the time 
interval Af. The resulting trajectory : [ jAt , (j + l)Af] — ¥ K 3 is denoted 
by /(xj(jAf), Vi(jAt), ai, At), where x;(j At) and Vi(jAt) are position and ve- 
locity, respectively, of the i th vehicle at the beginning of the time interval. Analo- 
gously, /(x(jAf), v(jAf), a, At) are the trajectories x : [j Af, (j + l)Af] — > R 3m 
of a cooperative motion. 

In this work, the modeled action alternatives consist of 
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• turning sharp left, going straight, turning sharp right, 

• hard braking, driving with constant speed, accelerating, 

and combinations thereof. This choice has been inspired by the emergency 
maneuvers commonly considered for a single vehicle ISOB06II . 

3.3 Loss Functional 

A loss functional L is used to evaluate a cooperative trajectory x(t). The loss 
functional can be considered the negative of a utility functional. Different aspects 
including collisions, road departure, and control energy are incorporated by a sum 
of several terms: 

L{x(f)} := L C oii{x(f)} + L ob st{x(f)} + L road {x(f)} + L C ontrol{x(£)} 

For example, a simple approach to penalize collisions is as follows: 

/ t max 171 171 

^ ^ ^ (X; (/ ) , X y ( t ) )d / 

*= 1 j=i + 1 

with the penalty factor A co a 3> 0 and 

{ 1 if vehicle i in configuration x^ and vehicle j in 

configuration Xj are in geometric collision . 

0 otherwise 

A more sophisticated loss functional could use information on the collision 
momentum to estimate the risk of injury llMH00|[WW02l . 

The terms L 0 b s t for collisions with obstacles and L roa d for road departure can be 
defined in a similar way. Obstacles may be stationary or moving on a known (resp. 
predicted) trajectory. 

The control term L CO ntroi penalizes longitudinal and lateral accelerations which 
cause energy consumption and passenger discomfort. 



4 Branch and Bound Search 

4.1 Tree Structure 



Considering the possible decisions of one vehicle over time, a tree of possible 
action sequences is obtained (Figure 4.1 i. At each node, a decision among | _4_o | 
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ci(ni,ai) 



Figure 4.1: Tree structure of one vehicle’s decisions. 



different actions is performed. A level of the tree corresponds to a point in time. 
The tree has |.4o \ T leaves, each representing a distinct action sequence. 

For m vehicles, the different approaches of decoupled and cooperative planning 
can be demonstrated in the tree structures: decoupled planning yields m separate 
trees with a total of m\Ao\ T leaves, while cooperative planning results in one tree 
having |.Ao| m7, leaves. A node of the cooperative tree has \A\ = |^4o| m children 
corresponding to all possible combinations of the vehicles’ actions at one point in 
time. 

In order to find the best sequence of actions, it is not necessary to construct the 
entire tree. Instead, the algorithm only stores the path corresponding to the action 
sequence currently under consideration. Additionally, the best path found so far 
and its loss value L* are memorized. 

A naive depth-first search for the best solution can become computationally in- 
tensive, as indicated by the exponential number of leaves. Therefore methods for 
reducing computation time without affecting the quality of the solution have been 
investigated. In principle, two approaches are possible: 



1 . Reduce the number of nodes expanded during search 

2. Reduce the processing time per node 



Both approaches will be detailed below, after the necessary notation has been 
introduced. 
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4.2 Notation 



A node of a tree is denoted by n, its parent node by p(n). The child node of n ob- 
tained after executing the action a is given by the function c m (n, a). The position 
vector of the vehicles and the previously executed action vector are annotated to 
a node n by x(n) and a (n), respectively, yielding a(c m (n,ao)) = ao. The loss 
functional and its components can be applied to node n as follows: 



L(n ) 



0 if n is the root 

L(p(n)) + L{f(x(p(n )), v(p(n)), a (n), Ai)}dt otherwise 



If the tree consists only of the decisions of vehicle i or of vehicles i, j, the nodes 
are labeled rii or riij, respectively. The function nk(n , i \, . . . , ik ) returns the node 
of the fc-vehicle tree which corresponds to n. 



4.3 Branch and Bound 



Branch and bound search is a standard method for eliminating irrelevant nodes 
from the search. A node and all of its descendants can be pruned if it is known 
that it cannot be contained in the optimal solution path. Let L(n ) be the loss value 
accumulated on the path from the root to node n, and L* the loss value of the 
best solution known so far, identified by leaf node n * . Then, the loss value of 
any solution containing n must be greater than L{n ) because the loss functional is 
additive over time and non-negative. This means that n can be pruned if 

L(n) > L* . 



Standard branch and bound can reduce the number of visited nodes considerably 
in many cases, however, a further reduction is desirable. This will require a more 
informed pruning criterion, i.e., the incorporation of knowledge on the subtree 
under consideration. If a lower bound h{n) for the minimal loss value of any 
path in the subtree can be obtained, the pruning can be improved as follows (see 
Figure 4.2 ): the subtree can be eliminated if 

L(n) + h(n) > L* . 



The resulting general branch and bound search method is presented in Algo- 
rithm |4~T1 

The tighter the bound h(n), the better the pruning. However, the complexity of 
computing the criterion has to be traded off against the gain resulting from the 
more effective pruning. 
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Algorithm 4.1 General branch and bound method for cooperative motion planning 
l: procedure SEARCH(t, n) 

2: for all a G A do 

3: n c 4— c m {n, a) 

4: Compute x(n c ) and L(n c ) 

5: if L(n c ) + h{n c ) < L* then 

6: if t + At < t max then 

7: SEARCH(f + At, n c ) 

8 : else 

9: L* ■<- L(n c ) 

10: n* <— n c 

1 1 : end if 

12 : end if 

13: end for 

14: end procedure 

15: procedure Plan(xo, vo) 

16: Perform precomputation 

17: L* <r- OO 

18: L(root) -1— 0 

19: x(root) t— Xo 

20: v(root) Vo 

21: SEARCH(0, root) 

22: return n* 

23: end procedure 



L* = 




>> h(n) 



Figure 4.2: Criterion to prune a subtree. 
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4.4 Precomputation of Single Vehicle Information 

A key to the derivation of a first lower bound h\(n) for a subtree is the obser- 
vation that several terms of the loss functional only depend on a single vehicle’s 
actions. This means that these terms can be computed within the single vehicle 
trees without requiring expontential complexity in the number of vehicles. 

In the precomputation phase, the vehicle position Xi(n^) and the following loss 
value is computed for each node np. 



Ll(ni) . — -i/road ijli) T -^obst (^i ) I ^control ) 

When ascending in the single vehicle tree, a lower bound h\(ni) for the subtree 
rooted at rii is obtained as follows: 



h\{rii ) 



if rii is a leaf 



Li(rii) - Li(p{rii)) + 



min 

aieAi 



hi(ci(rii,ai)) 



otherwise 



The gain resulting from this precomputation is twofold: 

1. A node n and its subtree can be pruned from the cooperative search if 

L(n) + h\(n) > L* (4.1) 

with 

m 

hi(n) := ^2 hi(ni(n,i)) . 

i = l 

2. The precomputed vehicle positions x^nj) and loss values L\{rii) are used 
during cooperative search. Thereby, the processing time per node is reduced 
drastically, as redundant computations are eliminated. This is possible be- 
cause the position x^ of a vehicle is determined solely by its own actions, 
while the composite state x depends on the decisions of all vehicles. 

4.5 Precomputation of Collision Information 

As long as follow-up collisions are not considered, the collision loss L coll only 
depends on the decisions of two vehicles ||LH98|| . Therefore, the idea from the 
previous subsection can be pursued one step further: the collision information 
can be precomputed by constructing every two-vehicle tree in advance, yielding 
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L^nij) := L m \\(riij). A lower bound for the collision loss in a subtree of 

a two-vehicle tree can be computed as follows: 

{ 0 if riij is a leaf 

L 2 (riij) - L 2 (p(n i:j )) 

+ min h, 2 (c 2 (nij, a,i, cij)) otherwise 

a,i£Ai, a,j£Aj 

These lower bounds improve the pruning criterion for the cooperative search: 

L(n) + h\(n) + /i 2 (n) > L* (4.2) 



with 

m m 

h 2 W : = 5 Z h 2( n 2{n,i,j)) ■ 
»=1 j=i + 1 



5 Results 

5.1 Collision Avoidance 

The proposed algorithms have been evaluated on different intersection scenarios 
involving two to four vehicles. Figure ED shows some examples of successful 
collision avoidance. The scenarios have been generated using the traffic simulator 
described in |VNB + Q7l . 



5.2 Planning Time 



In Figure 5.2 the runtime of the different algorithms is plotted with logarithmic 
time axis scale. For a better comparison of the more efficient algorithms. Fig- 
ure 5.3 shows the range below a runtime of 3 seconds with linear scale. The 



portions below the black line are for precomputation, the remainder for the branch 
and bound search. Each time shown is a mean value for more than a hundred dif- 
ferent intersection scenarios with m vehicles and T decision points. The variants 
of the algorithm are specified as follows: 



branch-bound: Standard branch and bound without precomputation (Algo- 



rithm 4.1 with h(n) = 0). 




Figure 5.1: Visualization of planned cooperative maneuvers involving two to four 
vehicles in intersection scenarios. 



Cooperative Motion Planning using Branch and Bound Methods 



197 



algorithm 


precomputation 


total time 


expanded nodes 


branch-bound 


0.00 ± 0.00 


533.36 ± 779.23 


1100454 ± 1620584 


precomp L 1 


0.22 ±0.02 


1.73 ± 1.86 


1 100442 ± 1620703 


pruning hi 


0.23 ± 0.01 


0.23 ± 0.02 


734 ± 2486 


pruning h2 


0.53 ±0.18 


0.53 ±0.18 


112 ± 307 



Table 5.1: Mean and standard deviation of precomputation time, total runtime and 
number of expanded nodes for 150 simulation scenarios with m = 3 and T = 3 
(all times in seconds). 



precomp LI: Precomputation of the single vehicle information (positions 
Xi(rii) and loss value but standard pruning ( h(n ) = 0). 

This variant is shown to distinguish the gain resulting by avoid- 
ing redundant computations from the additional gain by more 
effective pruning. 



pruning hi: Precomputation of the single vehicle information 
h\{rii) and pruning with criterion ED- 

pruning h2: Additional precomputation of the collision information L 2 (riij), 
and pruning with criterion \A.2\. 



The precomputation results in a substantial performance improvement. The prun- 
ing reduces the number of expanded nodes by several orders of magnitude (Fig- 
ure s The variance of the runtime also decreases, as the precomputation effort 
is less dependent on the particular scenario (Table HD- However, the precompu- 
tation of the collision information is advantageous only in rare cases because the 
precomputation effort often outweighs the pruning gain. 



5.3 Search Strategy 

It is interesting to compare the described depth first branch and bound strategy 
with the A* search strategy IRN031 . Preliminary results on this issue show no 
clear preference in the average case. However, some scenarios that seem to be 
rather easy for the branch and bound method are quite difficult to solve for A* and 
vice versa. The avoidance of redundant computations as presented in this report is 
crucial also for the efficiency of the A* strategy. 
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Figure 5.2: Runtime of the algorithms (logarithmic scale). 



6 Conclusion 



A tree search method for cooperative motion planning has been presented. 
Acceptable computation times can be achieved by using precomputation and 
pruning. 

Further studies are necessary to evaluate the performance of the proposed al- 
gorithm in other scenarios, e.g., obstacle avoidance and overtaking maneuvers. 
Changes in the action models and discretization parameters might be required in 
order to solve these problems. 
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Figure 5.3: Runtime of the algorithms (linear scale). 



like to thank all project partners participating in the development of the simulation 
system. 



Bibliography 



[ELP87] Michael Erdmann and Tomas Lozano-Perez. On multiple moving objects. Algoritlvnica, 
2:477-521, 1987. 



[FBB08] Christian Frese, Thomas Batz. and Jurgen Beyerer. Cooperative behavior of groups of 
cognitive automobiles based on a common relevant picture, at - Automatisierungstechnik, 
56(12):644— 652, December 2008. 

LFBB09] Christian Frese, Thomas Batz, and Jurgen Beyerer. Kooperative Bewegungsplanung 
zur Unfallvermeidung im StraBenverkehr mit der Methode der elastischen Bander. In 
Autonome Mobile Systeme. Springer, December 2009. Accepted for publication. 

[FBWB08] Christian Frese, Thomas Batz, Martin Wieser, and Jurgen Beyerer. Life cycle management 
for cooperative groups of cognitive automobiles in a distributed environment. In Proc. 
IEEE Intelligent Vehicles Symposium, Eindhoven. June 2008. 



Copyrighted material 




200 



Christian Frese 



■o 

CD 

■o 

c 

CD 

CL 

X 

CD 

( t ) 
CD 
"O 

o 

c 




m/T 



Figure 5.4: Nodes expanded by the different algorithms (logarithmic scale). 



[KZ86] Kamal Kant and Steven Zucker. Toward efficient trajectory planning: The path- velocity 
decomposition. Journal of Robotics Research, 5(3):72 — 89, 1986. 

[LaV06] Steven LaValle. Planning Algorithms. Cambridge University Press, 1st edition, 2006. 

[LH98] Steven LaValle and Seth Hutchinson. Optimal motion planning for multiple robots hav- 

ing independent goals. IEEE Transactions on Robotics and Automation, 14(6):9 12-925, 
December 1998. 

[MH00] H. Mooi and J. Huibers. Simple and effective lumped mass models for determining 
kinetics and dynamics of car-to-car crashes. Journal of Crashworthiness, 5(1), 2000. 

[MS06] James Misener and Steven Shladover. PATH investigations in vehicle-roadside coopera- 
tion and safety: A foundation for safety and vehicle-infrastructure integration research. In 
Proc. IEEE Intelligent Transportation Systems Conf, pages 9-16, September 2006. 

[RN03] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice 
Hall, 2nd edition, 2003. 

[SOB06] Christian Schmidt, Fred Oechsle, and Wolfgang Branz. Research on trajectory planning 
in emergency situations with multiple objects. In Proc. IEEE Intelligent Transportation 
Systems Conf, pages 988-992, September 2006. 

[Var93] Pravin Varaiya. Smart cars on smart roads: Problems of control. IEEE Transactions on 
Automatic Control, 38(2): 195-207, February 1993. 



Cooperative Motion Planning using Branch and Bound Methods 



201 



[vdBO05] 

[VNB+07] 

[WW02] 



Jur van den Berg and Mark Overmars. Prioritized motion planning for multiple robots. In 
Conf. Intelligent Robots and Systems, 2005. 

Stefan Vacek, Robert Nagel, Thomas Batz, Frank Moosmann, and Rudiger Dillmann. 
An integrated simulation framework for cognitive automobiles. In Proc. IEEE Intelligent 
Vehicles Symposium, pages 221-226, June 2007. 

Denis Wood and D. Walsh. Car to car interaction in frontal collisions: A model for the 
behaviour of the car population and options for improved crashworthiness. Journal of 
Crashworthiness, 7(1), 2002. 




Copyrighted material 




On Situation Modeling and Recognition 



Yvonne Fischer 

Vision and Fusion Laboratory 
Institute for Anthropomatics 
Karlsruhe Institute of Technology (KIT), Germany 
yvonne.fischer@kit.edu 

Technical Report IES-2009-14 



Abstract: This paper gives an overview of the components that have to be 
taken into account for automatic situation recognition. Automatic situation 
recognition in complex situations enhances situation awareness of decision 
makers. The technical basis for achieving human situation awareness is pro- 
vided through data fusion. Because of this connection, there exist unified 
models which combine both situation awareness and data fusion. One of 
these unified models is reviewed here. We also explain our interpretation of 
the term situation recognition and conclude that the first challenge is to find a 
suitable formalization of the term situation. Then, the basic requirements of a 
formalized description of a situation are extracted from general definitions of 
a situation. A possible formalization of a situation, which recently appeared 
in literature and covers the requirements extracted before, is reviewed. 



1 Introduction 

Currently, the need for intelligent decision support systems in the surveillance do- 
main increases. In existing systems, the signal processing aspects like detecting, 
identifying, and tracking objects in the observed area are usually well developed. 
The next challenge is to analyze the large amount of data, which can be generated 
in a surveillance system, automatically. The general question is therefore, how a 
situation, especially a threat situation, can be recognized. Essentially for human 
beings for recognizing a critical situation is to be aware of the situation. In the liter- 
ature, the term (human) situation awareness is used to describe this. Methods and 
algorithms for analyzing complex situations by the use of data from several sources 
are developed by the data fusion community. Because of this, data fusion can be 
interpreted as the technical basis for achieving human situation awareness. In this 
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article, we give an overview of the frequently used expressions situation aware- 
ness, situation assessment, and data fusion. Especially, we describe how these 
terms are interpreted in literature. We explain our interpretation of term situation 
recognition. By our analysis, we come to the conclusion that the first challenge 
in situation recognition is to find a formal description of a situation. The require- 
ments for a formalized description of a situation are then extracted from various 
definitions of the term situation. We finally highlight one formal description of a 
situation in literature that covers the requirements extracted before. 

The paper is structured as follows. Section[2]gives an overview of the expressions 
situation awareness, situation assessment, and data fusion in literature and the con- 
nections between them. Section[3]deals with situation recognition in general. Here, 
the essential components of how a situation can be recognized are pointed out. We 
extract the requirements for a formalized description of a situation from general 
definitions and highlight a formalized description of a situation that recently ap- 
peared in literature. In Section [4] we summarize this report and give an outlook to 
future work. 



2 Situation Awareness 

2.1 Human Situation Awareness 

A commonly accepted definition of the term situation awareness of human beings 
has been introduced by Endlsey, see for example llEnd95l : 

“Situation Awareness is the perception of the elements in the environ- 
ment within a volume of time and space, the comprehension of their 
meaning, and the projection of their status in the near future.” 

In llEnd95l , Endsley also points out the difference between situation awareness 
and situation assessment. According to Endsley, situation awareness can be seen 
as a state of knowledge whereas situation assessment is the process of achiev- 
ing, acquiring or maintaining situation awareness. The meaning of these terms is 
equivalently used in Niklasson llNRJ + 08l . But there also exist some other inter- 
pretations in literature, for example Lambert IlLamO 1 II defines situation assessment 
as a stored representation of relations between objects and denotes the associated 
process with the term situation fusion. But at least the interpretation of the term 
situation awareness as a mental model of the environment according to Endsley is 
almost accepted throughout the literature. Based on Endsley’s definition, situation 
awareness can be described through three different hierarchical levels: 
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World State 



SITUATION AWARENESS 


\ LEVEL 1 

jT Perception 


■ 

LEVEL 2 

Comprehension 


LEVEL 3 

Projection 





Figure 2.1: Situation Awareness (taken from Endsley IEC08I0 . 



• Level 1: Perception of status, attributes, and dynamics of relevant elements 
in the environment, for example through sensory detection. 

• Level 2: Comprehension of the current situation based on knowledge 
of level 1 elements. In this level, the decision maker understands the 
significance of objects and events in relation to the operator’s goals. 

• Level 3: The ability to project the future actions of the elements in the 
environment based on the knowledge of level 1 and 2. Level 3 consists of 
extrapolating information forward in time to determine how it will affect 
future states. 



A simplified illustration of these levels is depicted in Figure 2.1 It should be noted 
that the three different levels are not processed successively but rather parallel by 
human beings and all of them are contributing to the operator’s situation aware- 
ness. Endsley llEnd95l showed that the decision process of an operator is strongly 
determined by his situation awareness of the environment. Therefore, a good de- 
cision can only be made if the situation awareness is sufficient. Endsley flEnd95fl 
developed a model of decision making that takes situation awareness into account. 
The model addresses the impact of critical factors like attention, working memory, 
design features, workload, stress, system complexity, and automation on the oper- 
ator’s situation awareness and the model can therefore be used to generate design 
implications in complex systems. Based on the decision of an operator, an action 
is performed that changes the state of the environment, see Figure [2T| This state 
change influences again the situation awareness of an operator and therefore, the 
whole process can be described as a closed loop. 
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2.2 The JDL Data Fusion Model 

In IISBW99I . the following definition of the term data fusion has been given: 

“Data fusion is the process of combining data to refine state estimates 
and predictions.” 

The aim of data fusion is to improve the estimate of the state of an observed en- 
vironment. This can be achieved by using synergistic differences of overlapping 
information to determine relationships between multiple data from possibly differ- 
ent sources. As a result, data fusion allows improved estimation of situations and 
therefore, improved responses to situations. The most widely accepted model of 
the data fusion process is the JDL (Joint Directories of Laboratories) data fusion 
model I1SBW99II . The model was first developed with focus on the military domain 
but is now well established as a general architecture model for system description, 
design, and development. Because of this, there exist several revisions of the origi- 
nal model and in this article we refer to the more general version of I1SBW99I . The 
JDL data fusion model is divided into different fusion levels that can be interpreted 
as a technical categorization of data fusion-related methods. The different levels 
of the model are depicted in Figure [272] and can be described as follows: 

• Level 0 - Sub-Object Assessment: the process of estimation and prediction 
of signal states. 

• Level 1 - Object Assessment: the process of estimation and prediction of 
entity states. In this level, sensor data is combined to obtain reliable, consis- 
tent and accurate estimates of an entity’s attributes (e.g., identity, location, 
velocity, heading) 

• Level 2 - Situation Assessment: the process of estimation and prediction 
of relations among entities. These can be entity-to-entity relations but also 
entity-to-environment relations (e.g., close to, on top of, greater than) 

• Level 3 - Impact Assessment: the process of estimation and prediction of 
effects on situations of planned or predicted actions. 

• Level 4 - Process Refinement: the process of adaptive data acquisition and 
processing to support mission objectives. 

Level 4 can be interpreted as a process that is concerned with monitoring and 
optimizing the overall data fusion process. In the literature, level 0 and 1 are often 
called low-level data fusion whereas level 2 and 3 are called high-level data fusion, 
or even information fusion, which is motivated through the use of interpreted data 
as input. 
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Figure 2.2: JDL Data Fusion Model (taken from Steinberg HSB04I L 



2.3 A Unified Model 

Obviously there is a correlation between the human situation awareness levels 
and the JDL data fusion levels proposed in Section [2. 1| and Section [2.2[ respec- 
tively. Lambert IlLamO 1 1 associates the data fusion levels of the JDL model with 
the situation awareness levels of Endsleys mental/human model as follows: 



Data Fusion Level: 




Situation Awareness Level: 


sub-object and object assessment 


•H- 


perception 


situation assessment 




comprehension 


impact assessment 


■<-> 


projection 



Level 4 of the JDL data fusion model influences all data fusion levels and is 
skipped here for simplicity. Associated to human situation awareness, the pro- 
cess refinement level can be interpreted as organizing and classifying the informa- 
tion in mind. The JDL Data fusion model can therefore be seen as the technical 
basis for achieving human situation awareness. The connection proposed by Lam- 
bert IlLamO 11 is quite evident and often used in literature. Niklasson lNRJ + 08l 
used this connection to introduce a unified situation analysis model that allows 
the combination of automatic and human interaction at different levels. He ar- 
gues that within the near future, the high-level data fusion tasks (such as impact 
assessment) cannot be fully automated and will mainly be performed manually. 
Therefore, he proposed a descriptive model for integrating automatic, manual, and 
semi-automatic decision support, which is depicted in Figure [23] In this model, 
situation analysis implies the collection and processing of relevant information. 
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the knowledge of the relations between the different information and how the in- 
formation influences future decisions. In Figure [23] human situation analysis can 
be seen on the right-hand side and machine situation analysis on the left-hand 
side. Important in that model is the possibility to combine the analysis processes 
by the use of interaction channels. The central human computer interaction chan- 
nel allows interaction between human and machine situation analysis and the inter 
level interaction channels allow interaction between different levels of human and 
machine situation analysis. In this model, situation awareness is interpreted as a 
result of human or machine situation analysis and it can be an output of one level 
as well as an input to another level of situation analysis. Therefore, the interaction 
channels allow the transportation of situation awareness between different levels 
and between humans and machines. 




Figure 2.3: A Unified Model (taken from Niklasson lNRJ + 08l ) 
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3 Situation Recognition 

3.1 Situation Recognition in General 

Based on the information in Section[2] we can state that human decision making is 
a complex task and that the main aspect of making a good decision is the achieve- 
ment of sufficient situation awareness of the decision maker. It is known that 
complex situations can easily lead to insufficient situation awareness, which is the 
main cause of human mistakes and accidents (see for example I1EC08I ). Hence, 
in complex systems, an automatic assistance for enhancing situation awareness 
of the human decision maker is required. But on the other hand, as pointed out 
in lNRJ+081 . the situation awareness of a human decision maker might become 
worse if the system processes too many tasks automatically. These facts have 
always to be taken into account when a decision support system has to be designed. 

In today’s decision support systems, automation of the lower levels of the JDL data 
fusion model like object detection, identification, and tracking is present. But there 
is still a need for approaches in high-level data fusion like the automatic recogni- 
tion of specific and complex situations. Generally, automatic situation recognition 
has to deal with the following problems: 



• relevant features of the reality have to be observed and stored over time, 

• observations of the reality usually imply uncertainties (sensorial uncertain- 
ties caused for example by noise), 

• formalized descriptions of situations that have to be recognized are nec- 
essary (the system have to know what situations have to be recognized), 
and 

• situations are often not accurately describable (different configurations of 
observed features may imply the same situation). 



Automatic situation recognition addresses the problem of how to match the ob- 
served reality with predefined situations under uncertainties, hence situation recog- 
nition is a typical classification problem. The main problem is that complex situ- 
ations are typically not accurately describable and also that the relevant observed 
features are typically afflicted with uncertainties. The connection of automatic 
situation recognition with the JDL data fusion model and the human situation 
awareness model of Endsley is depicted in Figure 3.1 The top layer shows the 



situation awareness levels introduced by Endsley and the middle layer shows the 
data fusion levels of the JDL model. The bottom layer shows a possible technical 
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Figure 3.1: The connection of Situation Recognition with Situation Awareness 
and Data Fusion. 



realization of the upper layers, whereas the components of the technical realization 
are adopted from the description of the data fusion levels in the JDL model. The 
predefined situations, which are matched with the observed reality, are illustrated 
as an input for situation recognition We argue that the situation recognition part 
should be inserted after determining the relations among entities, because situa- 
tion recognition is an interpretation of the entities and its relations. Based on the 
results of the situation recognition, the state changes based on different actions can 
be estimated. 

Actually, the first challenge in situation recognition is to provide a formal descrip- 
tion of a situation. In the following section, the main components of different 
definitions of the term situation are investigated. 



3.2 Requirements on a Description of a Situation 



On different online-dictionaries, the following definitions of the term situation are 
given: 



• “The way in which something is placed in relation to its surroundings” 
(fMerl) 
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• “The set of things that are happening and the conditions that exist at a 
particular time and place” ( IlCaml ) 

• “The general state of things; the combination of circumstances at a given 
time” t llWorll ) 

All definitions include objects and how they can be characterized (“state of 
things”). The characterization of objects can be described by associated relevant 
attributes. Also the relations between different objects is highlighted in these defi- 
nitions. It has to be noted that it is not clear, whether a situation occurs at a specific 
point of time or whether a situation is defined over a time interval. 

At this point we can state the following requirements for a formal description of a 
situation: 

• objects are an important part of a situation, 

• objects are characterized by attributes, 

• time or time intervals should be considered, and 

• temporal, spatial, and attributive relations between objects are important. 

These basic requirements are illustrated by the following simplified example. 
Imagine the problem of recognizing the transport of illegal immigrants over sea, 
let us say from Lybia to the island Lampedusa in Italy. The situation can be de- 
scribed by a small, but fast boat that crosses the maritime border and approaches 
the island. The objects that are part of the situation are the boat, but also the mar- 
itime border and the island. In this example, the situation is also characterized 
by spatial relations. Therefore, the spatial positions of the objects are mandatory 
attributes to describe the situation. In case of the boat, the position is a dynamic 
attribute whereas in case of the maritime border and the island, the position is a 
static attribute, respectively. Further attributes of the boat are a mixture of static or 
dynamic types. Necessary attributes of the boat for the description of the situation 
are the size, the velocity, and the heading of the boat. Considered relations are 
the distance of the boat to the maritime border and the distance of the boat to the 
island. 

This example is quite simple, because only few objects, especially only one mov- 
ing object, are considered and only a few attributes are necessary to describe the 
situation. However, although the situation seems to be quite simple, there are 
more problems to consider. One problem is the classification of the attributes. 
That means especially when is the velocity of a boat considered as fast? Or at 
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which size is a boat considered as small? Another point concerns the inclusion of 
uncertainties into the description of a situation because the situation is not accu- 
rately describable. In our example, this could be the trajectory of the boat, which 
is not included in our example. So the boat could make a detour and will not head- 
ing directly to the island. The interpretation of the situation is then the same but 
attributes like the heading of the boat can have different values. 



3.3 A Formalized Description of a Situation 

In literature, a formalized notation of situation is not clearly defined. One gen- 
eral formalization has been presented by Jakobson in IIJBL07L which should be 
outlined in the following. Jakobson states that the modeling of a situation can 
be divided into three components: the structural, dynamic, and representational 
component. The structural component defines the topology of the represented sit- 
uation. Structural objects are entities, attributes of the entities, classes of entities, 
and relations between entities. The dynamic component includes the behavior of 
the entities over time and the representational component concerns the utilities by 
which the situation can be described. This is usually a set of languages, for exam- 
ple the Web Ontology Language, see llOWLl . or the Unified Modeling Language, 
see HUMLI . 

In the structural component, the main elements are entities e G E, where ECU 
is a subclass of all entities of the universe U. Each entity e is characterized by 
its set of attributes {ai, . . . , a p }, where each attribute is a collection of attribute 
properties like name, type, value, default value, etc. The attribute property value 
is represented as a triplet containing the actual value, uncertainty estimation and 
time t. The value-triplet of an entity is defined during the existence of the entity 
and is denoted by a(t), t C 8. 8 denotes the time interval 8 = (t 1 ,t") whereas t' is 
the creation time and t" the clear time of the entity. A relation is mathematically 
defined as a subset of entities R C E\ x . . . x E m with Ei C U for i = 1, . . . , m. 
Similar to entities, a relation can be characterized by a set of attributes {6i, . . . , bh} 
and the attribute value of the relation is also only defined during the existence of 
the relation and is denoted by b(t) with t £ 8 C In case of a binary 

relationship, the notation eiRej is used. The lifespan <5 of a binary relation is then 
8 C 8i Pi 8j. 

In the dynamic component, Jakobson IIJBL07I distinguishes three types of situa- 
tions. An entity-based situation S e (d) of an entity e over a time interval d C 8 is 
defined as the collection of specific attributes ai(t), . . . , a m (t ) of the entity e that 
have the same value during the time interval d. So for all t, t' € d the equality 
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ai(t) = ai(t') (i = 1, . . . , to) holds. We will denote this by 

S e (d) =< ai(t), . . ,,a m (t) > d . (3.1) 

Equivalently, a relational entity-based situation of a relation R over a time interval 
d C 5 is defined as the collection of specific relational attributes bi(t), ... , bh(t) 
of the relation R that have the same value during the time interval d. So for all 
(t, t ') G d the fact bi(t) = bi{t') (i = 1, . . . , h) holds. We will denote this by 

SR(d) =< bi(t),...,b h (t) > d . (3.2) 

The last situation type, the relational situation, considers only the relational aspect 
of the situation and is not concerned about the attribute values of the relation or 
the entities. In case of a binary relation, the relational situation over a time interval 
d C Si PI 5j can be denoted by 

S(ei, ej )(d) = eiRej. (3.3) 

More complex situations can now be described with the three basic situations (LD. 
s and ( |3 ,3) by using a kind of set-theoretical union and intersection operations. 
If Si(di) and are two basic situations of the above types, a new situation 

S with the lifespan d = d\ (T d 2 can be defined through 

S(d) = S 1 (d 1 )uS 2 (d 2 ) 

or 

s(d) = s 1 (d 1 )ns 2 (d 2 ). 

This formalization of a situation by Jakobson IIJBL07I is quite general but also 
powerful because it allows to describe a lot of complex situations. But one critical 
point is that no kinds of uncertainties are considered in the situation description. 
We argue that, for example, a specific situation does not imply that the relevant 
entity has to hold the same attribute values during a time interval. This point 
was also highlighted in our example in the previous section where the heading of 
the boat can be different but the interpretation of the situation is the same. But 
on the other hand, the formalization fulfills the requirements on a formalization 
of the term situation that have been stated in the previous section. Objects are in 
Jakobson’s formalization represented by entities with associated attributes and also 
time intervals are included. The relations in Jakobson’s formalization can also be 
of temporal, spatial, and attributive types. If the description is useful for situation 
recognition has not been investigated by Jakobson. Therefore, this question may 
be a topic in further research. 
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4 Summary and Outlook 

In this article, an overview of the terms situation awareness, situation assessment, 
and data fusion is presented. Especially, the connections between these terms are 
highlighted. Situation awareness is the key factor in making good decisions, but 
complex situations may easily lead to insufficient situation awareness of a human 
decision maker. Therefore, it is necessary to support the decision maker by auto- 
matic situation recognition. We present the main problems in automatic situation 
recognition and state that the first challenge in automatic situation recognition is 
to find a suitable formalization of a situation. Based on various definitions of the 
term situation, we extract the requirements for a formalization of this term and 
outlined one formalized notation that is present in literature. 

An important topic in future work will be to identify more formalizations of the 
term situation in literature. The first aim is to obtain a general, suitable, and use- 
ful formalization, which can be used for situation recognition. Further research 
may also concern matching the observed reality with predefined situations under 
uncertainty conditions. 
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Abstract: Navigating an autonomous underwater vehicle (AUV) is a difficult 
task. Dead-reckoning navigation is subject to unbounded error due to sensor 
inaccuracy and is inapplicable for mission durations longer than a few min- 
utes. To bound the estimation errors a global referencing method has to be 
used. SLAM (Simultaneous Localization And Mapping) is such a method. It 
uses repeated recognition of significant features of the environment to reduce 
the estimation error. Devices for environment sensing that are used in most 
land applications like cameras, laser scanners or GNSS signals cannot be used 
under water: GNSS signals are attenuated very strongly in water and light 
propagation suffers mainly from turbid water. In more than a few hundred 
meters water depth there is also no sunlight. Sonic waves suffer much less 
from these problems and that is the reason why sonar sensors are the prevalent 
sensor type used under water. A main difficulty is to extract three-dimensional 
information from side-scan images to perform SLAM. An overview of exist- 
ing approaches to underwater SLAM using sonar data is given in this paper. 
A short outlook to the system that will be used in the TIETeK project is also 
presented. 



1 Introduction 



Dead-reckoning navigation in vehicles is subject to unbounded error due to accu- 
mulation of sensor inaccuracies. The most common solution to bound the estima- 
tion errors is to use some global referencing, e.g., GNSS (global navigation satel- 
lite system) signals. Under water GNSS (or similar) signals are unavailable and 
therefore other means to bound the estimation error are necessary. SLAM (Simul- 
taneous Localization And Mapping, sometimes also termed Concurrent Mapping 
and Localization, CML IRRPL041 1 is a method that uses significant features of the 
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environment to reduce that error and to additionally build a map of the environment 
which allows self -localization within that map. 

The difficulties in underwater SLAM are mainly due to the fact that cameras do 
not provide valuable information in most cases as there is no sunlight and small 
particles in the water scatter light and therefore limit the range of the area that 
could be lighted. However most research on SLAM is done on land using cameras 
or laser scanners. Due to the aforementioned reasons sonar sensors are used in 
the vast majority of underwater applications. There are different kinds of sonar 
sensors. Multi-beam sonar sensors are able to directly give distance information 
while side-scan sonars only provide an echo amplitude level varying over time. 

Furthermore, SLAM relies on features in the environment that can be repeat- 
edly observed and can unambiguously be associated to already known features. 
Seafloor has naturally a structure that is mostly fractal BBL9711 what makes it rather 
difficult to distinguish different features that have a very similar appearance. 

In Section [2j an overview of the different sensors used in deep sea environments 
is given. Section [3] shows the difficulties and ambiguities of interpreting sonar 
echo returns. Methods for extracting 3D information from side-scan sonar data 
are outlined and some solutions from literature to that problem are described in 
Section [4] Section [5] shows approaches to SLAM in underwater context. In Sec- 
tion [6] the plans for the TIETeK project and its realization are shown. Section [7] 
gives a short summary of the paper. 



2 Sensors for Underwater Navigation and Localiza- 
tion 

Deep sea navigation and mapping is more difficult than land or air navigation. The 
two main challenges are the lack of a GNSS under water and being unable to use 
cameras for image acquisition. An exhaustive review of the underwater acoustic 
image generation process is given by Murino and Trucco in IlMTOOll . Here only 
the most important sensors for underwater navigation are presented. 



2.1 Dead reckoning 

Using only inertial sensors like accelerometers, gyroscopes, magnetometers, and 
depth sensors an estimate about the motion performed by the AUV (autonomous 
underwater vehicle) can be given. Over the time period of a mission the accumu- 
lation of the sensor noise leads to significant errors. Therefore, inertial sensors 
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alone do not allow sufficiently precise estimation of ego-motion. They need to be 
supported by complementary sensors. 



2.2 Cameras 

While it is technically possible to use cameras in deep sea applications, their prac- 
tical use is small: As there is no sunlight in deep sea, passive cameras are useless 
and active lighting is necessary. Lighting is possible only for short distances be- 
cause particles in the seawater scatter the light back to the camera - comparable to 
high beam headlights in a snowstorm. There is an interesting approach to alleviate 
that problem: a technique called gated viewing is able to overcome the scattering 
on particles by opening the camera shutter only for light that traveled over a certain 
distance IIAnd05l . 



2.3 Long Baseline (LBL) 

The lack of a GNSS can be overcome by setting up a so-called LBL array, where 
additional transducers have to be placed on the seafloor at known positions. They 
regularly send out a sonar beacon. From the delay of the signals an AUV can 
localize itself in the area within the transducers by means of trilateration. This 
approach is quite accurate but rather inflexible, complex and relatively costly. 



2.4 GNSS Aided Ultrashort Baseline 

In an ultrashort baseline (USBL) configuration a surface vessel sends an acoustic 
beacon to the AUV and the AUV sends an answer. From the angle of the incoming 
signal and the time delay, the relative position of the AUV to the surface vessel can 
be calculated. 

Together with accurate GNSS (e.g., DGPS) information one gets global referenc- 
ing of the AUV position. That information is then sent via an acoustic link down to 
the AUV. As the acoustic signals travel thrice the distance to the AUV (2 x beacon, 
1 x data), the position measurement arrives with some seconds of delay that has to 
be considered in the fusion step. A more detailed description is given in IlMGJOll . 

The usable range of the sound pulse restricts the area where one can communicate 
with the AUV. The term inverted USBL is used when instead of the surface vessel 
the AUV measures angle and distance to get the relative position to the ship. 
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Figure 2.1: Side scan sonar beams 
(schematically). 



Figure 2.2: Side scan sonar echo for- 
mation (frontal view). 



2.5 Sonar Sensors 



Sonar sensors cover a long range as sound waves are attenuated only very lightly 
in water. The sonar wavelengths allow resolutions down to centimeters. On the 
downside, sonar sensors are prone to exhibit speckle noise due to the use of co- 
herent waves llWes06l . In spite of the speckle noise sonar-based sensors are the de 
facto standard for underwater applications. Different kinds of sonar sensors exist, 
the most common are side-scan sonars and multi-beam sonars. 



2.5.1 Side-scan Sonars 



Side-scan (or side-looking) sonars have a very narrow beam (fa 1 °) in the hori- 
zontal plane perpendicular to the traveling direction of the AUV and a wide beam 
(fa 50°) in the vertical plane (see Figure m The side-scan sonar typically has 
only one transducer beam per side. It sends out a so-called chirp and records 
the amplitude of the echo return over time. The sonar echo formation process 
is depicted in Figure 2.2 That information only yields a single scan line that is 



highly ambiguous. There is no way to directly extract the sea bottom relief from a 
particular echo as it is an ill-posed inverse problem. 



Methods that try to recover the true relief from images belong to the family of 
shape-from-shading methods. Some that are used with side-scan sonar imagery 
are described in more detail in Section |4] 
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2.5.2 Multi-beam Sonars 

A multi-beam sonar is the sonar equivalent of a LIDAR system. Usually it uses 
a fan of single pencil beams and has a separate transducer for each beam. That 
way it can immediately return a distance information per beam. Compared to 
the side-scan sonar, the spatial resolution is lower as for each ‘pixel’ a separate 
transducer is needed. That is also the reason why multi-beam sonars are usually 
bulkier and need more power. On the other hand, data interpretation is easier 
compared to side-scan data as multi-beam sensors directly provide 3D information 
about the environment whereas the 3D information from side-scan sonar images 
has to be extracted first and is ambiguous. Fairfield HFKW05I investigated different 
sonar geometries with respect to the suitability for underwater SLAM applications 
concluding that a configuration with three mutually orthogonal great circles works 
best. This configuration yields only relatively sparse 3D information but big areas 
of the sensor coverage are overlapping that way. 



3 Ambiguity of Side-scan Sonar Returns 

Side-scan sonar echo returns contain very much information additional to informa- 
tion about the shape of the seabed. Extracting the shape information is an ill-posed 
inverse problem and can only be solved through regularization. 

That is probably the reason why most studies that were conducted did not focus 
on quantitative results but rather concentrate on giving qualitative correct results 
or classifying the seabed into regions of different types like sand ripples, rocks or 
flat silt areas. 

What increases the complexity futher is the illumination/ensonification direction 
of the side-scan sonar IIBCW99II . The sonar beam hits the ground at a low angle 
of incidence for the most part of the ensonified area and that same area will have a 
very different appearance when being viewed from another direction. 

All reconstruction approaches do not consider dynamic objects like, e.g., fish and 
assume the seafloor to be completely static. 



3.1 Challenges in Extracting Elevation Information 

To be able to reconstruct 3D shape from side-scan images one has to regularize the 
problem first. The amplitude of the echo is influenced by many effects, where the 
most important are: 



222 



Philipp Woock 




Figure 3.1: Many different geometries would yield the same echo image. 

• angle of the seafloor relative to the source, 

• distance to the source, 

• sediment absorption characteristics, 

• surface scattering properties, 

• absorption and dispersion of sound in water, 

• water currents, 

• varying sound speed in different depths, temperatures as well as areas of 
different salinity, 

• multipath propagation, and 

• sonar beam form. 

For regularization one has to make assumptions about these aspects. Assumptions 
that are common in literature are Lambertian surface scattering (i.e., a surface that 
is rough and isotropic and reflects energy equally in all directions) and the flat 
seabed assumption. Additional constraints can be given in the form of surface 
smoothness requirements (i.e., continuous, differentiable). Influences of sediment 
type, absorption and currents are neglected in most cases. Multipath propagation 
is only of minor importance because sonar uses time-gated pulses. The sonar beam 
form is either known in advance or can be estimated in parallel HCPL07I . 

Naturally, there is no information about shape in shadowed areas of side-scan im- 
ages. Several different shapes could generate the same echo (see Figure [3!7) . The 
side-scan imaging process also introduces geometric deformations especially in 
short distances to the sensor: foreshortening and layover llWes06ll . Foreshortening 
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Figure 3.2: Long slopes may appear 
shortened in the echo image. 




Figure 3.4: Change of side-scan reso- 
lution across-track. 




Figure 3.3: Echoes from different parts 
of the ground may arrive simultane- 
ously at the sensor. 




Figure 3.5: Change of side-scan reso- 
lution along-track. 



describes the effect that a slope towards the sensor is shortened in the sonar im- 
age. Layover describes the effect that echoes from points higher above the ground 
may arrive simultaneously with other echos and the echos could overlap. This is 
described in Eigurcs|3.2|and|3.3| 



Related to foreshortening is the observation that side-scan sonars exhibit changing 
spatial resolution across-track. Along-track the resolution is also varying due to 
the beam widening as the range increases (see Figures 3.4 and [ 53 ] adapted from 
llMaz85l ). The mentioned effects stem from the sensor principle and cannot be 
avoided in general. 



Despite the mentioned drawbacks and ambiguities side-scan sonars are used very 
often as the simple construction makes them comparably cheap. The output can 
also be easily visualized and is often interpreted by humans. That makes side-scan 
sonars probably the most prevalent sensor used in deep sea applications. 
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3.2 Simulator for Side-scan Imagery 



J. Bell developed a sophisticated simulator for side-scan imagery based on ray- 
tracing that considers a stratified water column, sonar beam width, sonar beam 
directivity, multipath propagation, transmission losses, and ego-motion IIBL97I . 



Most of the ambiguities shown in Section 3.1 have been taken into account. The 
resulting images were not only visually compared to real sonar images but the im- 
age statistics of the simulated images were also checked against real sonar images. 
It allows to compare shape reconstruction algorithms quantitatively with ground 
truth. A good simulation tool can also be seen as a formulation of the forward 
problem that needs to be inverted. 



4 Elevation Information from Side-scan Data 

To be able to make full use of the sensor measurements for SLAM, it is desirable 
to obtain a 3D representation from the 2D sonar measurements. A few methods to 
accomplish that are presented in this section. 



4.1 Estimating Elevation from Shadows 

Reed et al. I1RPB04H use co-operative statistical snakes to detect highlight regions 
with neighbouring shadow regions that are facing away from the sensor. Assum- 
ing an otherwise flat seabed reconstructing object elevation information from the 
shadow lengths with the help of the theorem on intersecting lines is a simple and 
natural approach. Some methods work on two-dimensional images. Those have 
to be created via registering the single scans lines first. The easiest approach is 
to just stack the scan lines. More sophisticated approaches would pre-process the 
scan lines considering geometrical configuration as well as vehicle motion prior to 
stacking. 



4.2 Propagation Shape-from-Shading (SfS) 

Propagation SfS was pioneered by Langer and Hebert I1LH9 1 1 and modified with 
a different scattering model by Dura, Bell and Lane llDBL04l . The seafloor re- 
construction uses one scan line at a time. The reconstruction is starting from di- 
rectly beneath the sensor where a flat seabed is assumed. The vertical distance 
from the sensor to the seafloor is either known or can be estimated from the sonar 
data. Starting from there, the inclination angle to the surface normal is propagated 
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towards the outer end of the sensor coverage area depending on the reflectivity. 
Propagation SfS is quite robust against shadowing effects and gives good results 
on directional surfaces like sand ripples. Langer and Hebert suggest additional 
scan line preprocessing for noise reduction. They propose median filters and a 
graduated non-convexity (GNC) filter which is able to maintain discontinuities. 
Pre-filtering is necessary due to noise which is easily corrupting the reconstruction 
as errors add up towards the outer ends. As every sonar line is processed indi- 
vidually this approach lends itself well to online use. However the correct mutual 
registration of the processed scan lines depends on the ego-motion of the AUV 
which has to be estimated. Ego-motion estimation errors lead to errors in the 3D 
reconstruction of the seafloor surface. 



4.3 Linear Shape-from-Shading 



The Linear SfS method IIDBL04I from Dura, Bell and Lane is based on the work 
from Bell et al. on side-scan image directionality effects I1BCW991 . The approach 
is working in the frequency domain and uses the fact that the sonar image is a 
directional filtered version of the seabed height map. They provide a linear trans- 
form that relates the Fourier transform of the two-dimensional sonar images to the 
seabed height. 

Linear SfS is not as robust as Propagation SfS when processing ripples, how- 
ever, it is much more robust in the presence of noise and in processing isotropic 
seabeds. Processing two-dimensional sonar images makes it more difficult to use 
this method as an online method. 



4.4 Hierarchical Recovering of Shape from Side-scan Data 

In the works of Coiras, Petillot and Lane ( I1CPL07I . llCPL05in an elevation map is 
reconstructed from side-scan sonar images. They start at a coarse resolution using 
an expectation-maximization approach with gradient descent to iteratively refine 
the most probable shape that corresponds to the echo amplitude. They assume 
Lambertian surface scattering. 

Their method is not designed as an online method. They work with two- 
dimensional side-scan images. This has the advantage of incorporating parameter 
dependencies across scan lines but needs scan line registration first. 
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5 SLAM on Sonar Data 

SLAM is a method to reduce and bound uncertainty in vehicle localization and the 
resulting environment map through multiple re-observation of salient features and 
incorporating prior knowledge via a motion model. For the method to succeed it is 
essential that a certain feature can be recognized with high certainty and is not mis- 
taken for another one. When the algorithm is able to identify landmarks as already 
visited ones the accumulated errors from inertial navigation can be significantly 
reduced. 

Side-scan sonar images, however, do not provide 3D information directly. The 
sensor output after registering the scan lines is 2D. Although 3D information is 
not strictly necessary to perform SLAM, the different appearance of features when 
ensonified from different angles makes it nearly impossible to recognize landmarks 
from the 2D image alone. Therefore, it is clearly beneficial to be make use of the 
three-dimensional reconstruction from the methods described in Section!?] 



5.1 SLAM on Side-scan Data 

An approach using side-scan sonar data for SLAM is introduced in I1RRPL04I . The 
localization is done only in 2D with three parameters (2D position and yaw angle, 
[x,y, (/)]). Not only a side-scan sonar has been investigated but also a forward- 
looking sonar that is able to make more frequent re-observations of landmarks. A 
comparison with dead reckoning alone shows that the drift error in the estimation 
that makes additional use of the side-scan sonar is half of the pure dead reckon- 
ing estimation error. The landmark matching however was done offline by hand 
yielding perfect matches and no mismatches. All landmarks were observed twice 
or thrice. They use EKF-SLAM, which is unsuitable for large numbers of land- 
marks and works only correctly when no mismatches are present. The focus of 
the paper, however, lies more on the offline post-processing using a Rauch-Tung- 
Striebel (RTS) smoother IISar08l and they did not compare their SLAM algorithm 
without the RTS smoother. The authors emphasize the great value of smoothing as 
a post-processing step. 



5.2 3D SLAM on Sonar Data 

Fairfield et al. IFKW07I use multi-beam sonar information to create a map of 
the environment. The sonar sensors give sparse 3D information about the shape 
of the surroundings. Their SLAM method is based on Rao-Blackwellized particle 
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filters (RBPFs) I1MT07I . Additionally, they use a sophisticated data structure called 
Deferred Reference Counting Octrees (DRCO) to keep the memory requirements 
low as each particle has to carry the full evidence grid. They show an adaptive 
method where they use only as many particles as is feasible for real-time operation. 
They are able stay well below 10 m of localization error even after about 10000 
SLAM iterations. This is a very promising approach that shows how to combine 
such conflicting requirements as real-time operation and using as many particles 
as possible at the same time. 



6 Outlook - The TIETeK Project 



In the TIETeK (Tiefsee-Inspektions- und Explorations-Technologiekonzept) 
project an AUV is developed that is able to autonomously map and explore the 
bottom of the deep sea. As water pressure is very high in the deep sea, most parts 
of the AUV should be built in a pressure independent manner to withstand depths 
of about 6000 m. However, emphasis is also on creating a comparatively cheap, 
highly configurable vehicle that is able to use different sensors depending on the 
task at hand. 

The main sensor will be a high resolution side-scan sonar accompanied by IMUs 
(inertial measurement unit), DVL (doppler velocity log) and depth sensors. Ad- 
ditionally, SLAM will be used to support navigation and to facilitate exploration. 
The SLAM procedure to be developed should also work with multi-beam sonar 
data. 



6.1 Project Workflow 

The first step will be to extract seafloor elevation information from the side-scan 
sonar data. Vehicle movement at acquisition time has to be taken into account to 
interpret the echo correctly. 

Next, it is important to find salient features within the relief that can be re-observed 
with high reliability. This is a very challenging task as the appearance of the 
features will change significantly when the viewing direction changes. The relief 
of, e.g., sand ripples may be nearly unobservable for the side-scan sonar when 
viewed from certain angles. In case the approach is unfeasible due to landmark 
extraction and association being too ambiguous, a grid-based approach using 2.5D 
elevation maps will be evaluated. 
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Thirdly, the SLAM method will be employed to obtain accurate navigation and 
map information. It is crucial to keep the map sizes low and use the computing re- 
sources efficiently as the algorithm needs to run in real time on COTS (commercial 
off-the-shelf) hardware inside the AUV. 

The map and the improved ego-motion information from the SLAM algorithm 
will be used to support vehicle navigation and is the basis for explorative actions 
taken autonomously by the vehicle. In an additional post-processing step, maps of 
higher quality than the online maps will be built. 



7 Conclusion 

The side-scan sonar echo return is highly ambiguous which makes shape extraction 
from that signal a challenging task. Additionally, SLAM in underwater applica- 
tions is far from being as well-understood as in airborne or land applications. As 
ground truth is basically unavailable, quantitative evaluation is extremely difficult. 
That is the reason why an underwater simulation environment is of great help. 

SLAM has been used only very sparsely in underwater applications yet. Once 
computational power becomes cheaper, more real-time solutions to underwater 
SLAM will surely emerge. 
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Abstract: Modern autonomous systems are challenged by a necessity of the 
permanent situation awareness: perception, comprehension, and prediction of 
the surrounding environment. A basis for such awareness is a reliable world 
modelling that serves as an efficiently structured memory. 

This contribution proposes several new approaches in the world mod- 
elling. The Progressive Mapping represents a dynamical description of real 
world elements with mapping sets of objects and attributes. The prior knowl- 
edge about object types and relations is introduced as a collection of semantic 
networks of classes, while dynamic relations within the world model are rep- 
resented by semantic networks of objects. Also, current contribution presents 
a processing of degree-of-belief distributions and a possibility of calculation 
of memory limits. 



1 Introduction 

An autonomous system operation requires constant situation awareness (SA). The 
most acknowledged definition of SA was given by Endsley llEnd95l : 

“the perception of elements in the environment within a volume 
of time and space, the comprehension of their meaning, and the 
projection of their status in the near future” 

The foundation for such awareness is a world modelling, which describes the sur- 
rounding environment and serves as an information hub for all subsystems. The 
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information can be provided by sensors or by prior knowledge storage. Each in- 
coming information piece contains some amount of uncertainty that can be char- 
acterized by a degree-of-belief (DoB) distribution and can be merged into existing 
models by means of Bayesian fusion. 

Modern world modelling systems for autonomous systems are usually limited to a 
concrete set of tasks. For example, in the case of path planning, exploration and 
localization tasks, the modeling system is often represented by SLAM or other ge- 
ometry and topology systems. For another example, the autonomous receptionist 
or medical assistant robot is classifying surrounding objects and situations for re- 
active decision taking. In the most common case, the world modelling system has 
to represent all objects and relation of the surrounding environment and to model 
them in a way of a digital sandbox. This allows for reactive as well as proactive 
operation, which are vital for many advanced applications (e.g., autonomous cars 
or humanoid robots). 

A world modelling subsystem presented in this contribution accumulates informa- 
tion, builds relevant models of the real world, and provides information to other 
subsystems. The modelling can process information streams in a form of Degree 
of Belief distributions with parameterized uncertainties and merge distributions 
with Bayesian fusion. An adequate and efficient environment representation re- 
quires modelling of the environment elements and as well their relations. Each 
element can be described as an object and its relations as connections in semantic 
networks. It is important also to specify semantic of each object by introducing 
information from a prior knowledge. The modelling subsystem serves also as an 
information hub for all subsystems from lower levels (hardware response) to the 
highest one (planning, context recognition, etc). 

The presented material is structered as follows: Section[2]describes the structure of 
a prior knowledge. Section |3] deals with dynamic modeling. Section [^introduces 
the idea of probabilistic matching of dynamic objects to prior knowledge. Sec- 
tion [5] presents the experimental set up and test results. Finally, a short summary 
is given in Section [6] 



Modelling domains The description of the environment is performed by mod- 
elling its elements and their relations. It is convenient to design the modelling 
system within the object-oriented methodology, where the real world elements 
are represented by classes and particular things by class instances (objects). For 
clear terminology, it is important to distinguish domains of environment and model 
realms: the Dynamic World (real things and relations). Dynamic Models (ob- 
jects and networks), and Prior Knowledge (concepts of classes and relations, prior 
records) (Figure [TTT| ). 
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Figure 1.1: Modelling domains. 



The Real world realm is represented by the Dynamic World domain. It is popu- 
lated by environment elements, which can be perceived by sensors or estimated by 
cognition processes. 

The World modelling realm covers the Prior Knowledge and Dynamic Models do- 
mains. The Prior Knowledge domain (Section [2j contains Class concepts (known 
types of real world elements). Relation concepts (relations between classes), and 
Prior records (information about specific environment elements). The Dynamic 
Models (Section[3j domain contains semantic networks of objects (Figure 1.2 1 . 




Figure 1.2: Example of a semantic network. 
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2 Prior Knowledge 

The prior knowledge domain defines the class and relation types known to the 
modelling subsystem, relations between concepts, and additional information 
about specific objects. 

Class concepts The Class concepts define object, attribute, and relation types 
and semantic. 



Relation concepts Relations specify possible connections between objects and 
are defined as semantic networks of Class concepts. Most of the previous re- 
searches on knowledge representation for autonomous systems are focused on a 
narrow set of tasks, using one or two semantic networks (e.g., geometrical and 
functional primitives I1RDR95I or spatial hierarchies with additional attributes 
(color, weight) llFri98L llRie97ll . ]Rog03 1). However, an autonomous system 
requires different relations (e.g., “part of”, “is a”) of correlated classes or ob- 
jects. Thus, multiple semantic networks over the same classes or objects are vital 
(Figure [Zl), which allows for multi-dimensional relations processing. 




Geometry 

Functionality 

Specialization 



Figure 2.1: Multiple semantic networks. 



Levels of abstraction One of the relations defined within the relation concepts 
is the specialization, i.e. a hierarchy of classes with the “is a” connection (Fig- 
ure B Specialized classes have more attributes than the more abstract parent 
class. Specializations can be arbitrary assigned to some abstraction levels by 
known attributes. The hierarchy can be represented as a pyramid of abstraction 
levels (Figure 2.2 i with the blank class at the top, meaning the existence of “some- 
thing”. Each object of a world model can be processed on any abstraction level 
1GHB08I above the reached one, thus containing different granularity of infor- 
mation. Dynamic objects for world modelling can be instantiated from the Class 
concepts of the hierarchy of abstraction pyramid IGHB081 . 

The concept of abstraction hierarchies is motivated by several ideas. At the very 



Object-Oriented World Modelling for Autonomous Systems 



235 



Concrete 



Dummy 



Blank 




Predefined class and relation 
concepts 



Figure 2.2: Classification abstraction pyramid. 



beginning, the autonomous system knows little about the environment and per- 
ceives only things around as much as sensors allow. As time passes, it gathers more 
information about the surrounding world and updates the model objects, lowering 
their abstraction level in the pyramid. The shift to lower levels occurs, for exam- 
ple, by specialization of the object from a common structure (e.g., “box”) to some 
specialized structure (e.g., “postal package”). If some inconsistencies are found 
in the model (e.g., “postal package” starts to fly) the object is raised to the upper 
abstract levels of the pyramid hierarchy, which corresponds to removing wrong 
specializations. 

Another reason for processing objects on different abstraction levels is context de- 
pendency. The same object can be viewed differently depending on the context: a 
bookcase is considered an obstacle during a path planning (leaving many attributes 
and contained books out of scope) or a storage for books by text finding. 

In spite of advantages of categorization of objects according to prior knowledge, 
the classification mechanism is affected by several problems: 

1 . Multiple hierarchies - it is not clear how to organize a specialization hier- 
archy if the system receives different sequences of data, like in one case 
’’shape” before ’’temperature” (visible distant thing), in another case ’’tem- 
perature” before ’’shape” (large complex structure in the proximity). Which 
attribute should be on a lower level of abstraction is not obvious in this case; 

2. Fixed construct limitations - the classification approach is, in principle, rigid 
and limited comparing to symbolic description llAhl02l . This means, it is 
possible to classify objects only among predefined categories and on the 
basis of expected attributes. An arbitrary unexpected situation cannot be 
processed properly; 
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3. Quantity complexity - a set of classes can be enormously large for repre- 
senting any complex real-life environment. Large information sets are hard 
to develop and maintain for programmers and are also memory and CPU 
consuming. Moreover, large sets lead to increased miss-matching due to a 
large number of objects of similar structure and different semantic. 



Due to the first two problems, the classification approach represents an engineer- 
ing workaround limited in its narrow scope of given tasks. A more flexible ap- 
proach has to be used to advance objects through hierarchy levels and to select 
matching attributes for different tasks. This approach is the Progressive Mapping 
(Section 3.3 I that allows flexible selection of attribute sets for each given task or 
matching to prior knowledge (Section|4.1[). 



Prior records Prior records contain information about specific things and 
relations of the environment (e.g., information about a specific person or location). 



Prior knowledge storage The prior knowledge about the real world can be 
stored as semantic networks at back-end layer. The idea of back-end comes 
from the consideration that prior knowledge database should represent all rele- 
vant knowledge about the real world, thus, it can be too complex for a real-time 
dynamic modelling. 

The size of the prior knowledge collection is limited by the combinatorial com- 
plexity of large multidimensional data sets. To overcome this problem (the third 
problem in Section [2jl, a modular thematic architecture is proposed. Each module 
contains information relevant to a current scene (e.g., kitchen or living room) or 
context (e.g., party or siesta). The back-end system can load and unload modules 
on demand by cognition subsystem. 



3 Dynamic Models 

The Dynamic Models domain contains semantic networks with objects and their 
relations, which are found in the surrounding environment. 



3.1 Uncertainty 

Dynamical modelling implies modelling object attributes, which are in the ideal 
case, single values. However, sensor data contain uncertainties due to acceptance 



Object-Oriented World Modelling for Autonomous Systems 



237 



Table 3.1: Attribute modelling with uncertainty. 



Representation 



Type (t) 



Mass (m) [g] 



Fixed value 



Netbook 



1000 



Fixed value 
with uncertainty 



Netbook (48%) 



1000 ±a 
(cr = 32) 



Marginal DoB 
distribution 



Netbook (48%) 
Ultraportable laptop (40%) 
Thin laptop (12%) 



0.48 • N(1000, 32) 

+ 0.4 ■ )V(1200, 35) 
+ 0.12 • iV(1600, 45) 





Pair DoB 
distribution 



P(s, m) 



Joint DoB 
distribution 



P(s, m, .... k) 



problems or environment noise. The uncertainty can be specified, for example, 
via confidence levels for discrete values and the standard deviation for continuous 
values. Though, the more informative representation is a Degree of Belief (DoB) 
distribution P (Table ful) . The DoB of objects with different attributes can be sim- 
plified to marginal distributions for each of the attribute or given as a joint DoB 
distribution of correlated parameters. 

The lower the level of the representation is, the more information about the at- 
tribute it contains. It is always possible to process information at any level above 
but not below. Although the joint DoB is preferable as the most complete de- 
scription, it is not used in this contribution due to fast growing (e.g., exponential) 
complexity. 



The Degree of Belief considered in the current contribution is a distribution P(s) 
over one attribute parameter s (Table ful) . Each attribute of the modelling object is 
given by a DoB distribution P. The incoming information I\ is fused with existing 
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information Iq\ 

P(s\I 0 )^ P(s\I 0 ,h). 



The entropy of the DoB distribution is: 

H(s) = ~^P(s)\og(P(s)). 

S 

The incoming information narrows the distribution peak over a given value, thus, 
reducing the entropy. The change in the quality of information (mutual informa- 
tion) can be calculated numerically as a change of the entropy from the initial state 
to the state with information I\ : 



MI(s;h) = H(s\I 0 )-H( s \I 0 ,h). 



3.2 Object Aging 



Elements of the real world are not persistent: an apple on a table is not meant to be 
on its place for all time. So, each object in the model should be reconfirmed after 
some amount of time or stated to exist with some decreased DoB value (object 
aging). The object aging represents awareness about changes in the real world 
and a need to perceive them. Each modeled object contains a DoB of existence 
IIGHB08I . If its value exceeds some threshold D c , then the object is counted to 
exist in the model at least as a blank one. If it decreases below some level Dd, then 
the object is deleted from the world model. 

The object’s life cycle is given in IIGHB08H as a constantly falling DoB of existence 
(e.g., by exponential decrease with aging factor F a ging ) with risings on sensory 
reconfirmations (Figure [3, 1(a)) . The reconfirmation threshold D r assures that the 
system is signaled about sensory validation need. The validation itself can occur 
during some allowed time period determined by D r — Dd and clock frequency. 
Similar to object aging, which decreases the existence DoB value, an attribute 
aging can disperse the attribute’s DoB distribution. For example, if a DoB of the 
“temperature” attribute is peaking sharply at 21°C, it becomes flatter and wider 
as time passes, reflecting the increased uncertainty about the current temperature 
(Figure 3, 1 (b)[ >. 

The concept of aging helps also by learning corrections. For example, if a system 
has noticed a car driving over water (actually, over a large puddle), it can classify 
the car as an amphibian (peaking the DoB distribution over the “amphibian” type). 
With passing time, the system perceives that this object drives only over ground 
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(b) Attribute aging (DoB dis- 
persion). 



Figure 3.1: DoB of existence (a) and attribute (b) 



(increasing DoB peak over the “car” type), so it “forgets” about the water floating 
possibility (similar as a human being does). 

The aging factor F ag i ng can be different for different classes. For example, a table 
is more persistent than a cookie. So, an aging correction, called “class correction” 
C class can be applied to the aging factor Fq. A context (e.g., party or siesta) and 
other world parameters can also alter the aging, introducing a set of corrections 
F 1 context i ■ • • 5 Cjy to Fq'. 

Faging -^0 ~T ^cZass ~h ^context ~b ' ' ‘ ~h 



Since objects are deleted after reaching the Dd, the aging factor and a set of cor- 
rections determine memory time limits (i.e. for how long can the system contain 
objects) and enable numerical estimation of memory usage at an arbitrary time 
point. 



3.3 Progressive Mapping 

The dynamical models represent environment elements as objects and their rela- 
tions. The incoming sensory information, the cognition results, and the object 
aging refresh the dynamic state of the model. Objects can be instantiated from 
classes of the Class concepts domain and reinstantiated by derived specialization 
classes each time new information arrives HGHB081 . This approach, though, is not 
optimal for dynamic states due to problems mentioned in Section [2] Instead, this 
contribution proposes a dynamic description of objects - an on the fly assignment 
of attributes. 
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The dynamic description approach developed in this contribution is called 
Progressive Mapping (Proming). The incoming information (even unclassified) 
are fused into the existing model by the means of Bayesian fusion. The atomic 
level of information is an attribute, which describes an arbitrary parameter (e.g., 
position or color) of an environment element in a form of a DoB distribution. Like 
real world attribute, each modelling attribute can have multiple values (e.g., a ball 
can be white-black colored), so, peaking over multiple values in the DoB distri- 
bution. Objects contain a set of attributes, which describe physical properties, 
semantic meaning, or other information. Objects of one part of the environment 
(e.g., room or area) are grouped into a scene. The visual scheme for such an archi- 



tecture is presented in Figure 3.2 



Scene 1 



Object 1 

Attribute 1 

P(al) . . . 



Attribute 2 

P(a2 ) . . . 



Object 2 

Attribute 1 
P(al) . . . 



Attribute 2 

P(a2) . . . 



Figure 3.2: World modelling architecture. 

Each information piece contains intrinsic properties, like name (e.g., “tempera- 
ture”, “color” for attributes; “Andrey”, “coffee machine” for objects), DoB distri- 
bution and some others. 

At the beginning, each object is created as an empty container. The incoming data 
is fused into existing maps with consistency and meaning checks between new and 
existing information. The progression of information maps over the time is shown 
in Figure |33j The granularity of Proming levels is so small that it is not reasonable 
to distinguish them. The abstraction pyramid becomes quasi gradual as shown in 
Figure |T4| 

The Progressive Mapping mechanism eliminates the first two problems of the di- 
rect classification approach (Section[2]). Namely, instead of static class specializa- 
tions, flexible progressive data structures are employed. 

Although, the Progressive Mapping delivers flexibility of a dynamic description, 
it has problems with prior knowledge representation. A solution to this issue can 
be achieved by a hybrid approach: the dynamical world model is represented by 
Proming, which is linked to Prior Knowledge domain (Figure [T4| . 
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Figure 3.3: Progression of a scene map over the time with a typical life cycle of 
an object. 
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Figure 3.4: Classification (left) and Proming (right) pyramids. 
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4 Domains Interchange 



For correct cognition and semantic processing of environment elements it is vital 
to synchronize the dynamic models with the prior knowledge. 

Inspite that the Classification pyramid implies the classical object-oriented ap- 
proach, the modelling objects are not instantiated from these classes directly. The 
Proming is based on container objects, which can be assigned to specific classes 
by a matching process described in Section 4. 1 The container objects concept lays 
aside the object-oriented methodology. 



4.1 Class Matching 

One of the most important things in the world modelling is the connection of dy- 
namical model and prior knowledge. At this point, model objects with a set of 
perceived attributes are recognized as objects of known classes (class matching) or 
known class instances (prior record matching). During matching, a set of object at- 
tributes is compared to class attributes in the classification pyramid on a probability 
and semantic basis. An attributes deviation vector cr(ai, < 22 , . . . , ajv) determines 
the confidence level of the matching to some known Class concept. The class 
matching advances objects through the pyramid as soon as object attributes match 
a predefined class and constantly links the object to prior knowledge (Figure |3^4) . 
For example, if an object is classified as a cup, then the system automatically re- 
ceives information that the object can be used to carry liquid, has a stable surface 
on the bottom, and can be found in a kitchen. The matching process can go recur- 
sively to lower levels as long as the attributes and semantic deviation is less than 
Some limit (TmdtGhing • 

A deduction of missing attributes can also be made upon class matching. For ex- 
ample, if an autonomous system detects a ball of about 22 cm diameter and of a 
distinct black-white pattern of truncated icosahedron, then it can try to match the 
prior knowledge and classify the ball type as P(t). The type distribution can, for 
example, peak over the “football” P(t = F ) and, due to perception uncertainty, 
a volleyball P(t = V). The missing DoB distribution for the attribute “weight” 
P(w) can be estimated from the DoB for the type P{t) and the prior knowledge 
about the weight attribute given the class type P(w\t = X ): 

P(w) = = x ) p (t = X). 

x 



A similar process of matching and deduction can also be useful for other tasks. 
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like context (e.g., party or siesta) recognition. 

The abstraction pyramid allows a classification mechanism, where each modeled 
object is assigned to some abstraction level. A model object is progressing over 
the time (receiving more attributes and values) and can be at any moment matched 
to Class concepts of the abstraction pyramid. A deduction of missing attributes 
can also be made upon class matching. 

Objects of a world model can be viewed and processed on all levels from the clas- 
sified to the top (Figure [3?4] i. For example, a football can be processed on the levels 
of football, ball, point cloud, bounding box, centroid, and blank object. 

The prior knowledge back-end represents not only the classification pyramids for 
object specializations but also semantic meanings to possible attributes and rela- 
tions. This helps finding inconsistencies, like the measured temperature is below 
0 K or a door has two rotation axis. 



4.2 Dynamical Prior Records 

Prior records are predefined or can be created dynamically by an autonomous sys- 
tem during the perception and cognition processes. Since the dynamical informa- 
tion is affected by object and attribute aging, it can be lost after some time. There 
are situations when such a direct deletion of objects is not optimal. For example, a 
cup is moved from a table into a dishwashing machine. After several hours the cup 
should be recognized during the dishwasher unloading process. For this case, the 
deletion of the object can be performed in two steps. Firstly, the object is shifted 
from the world model to dynamical prior records. This reduces the model, saves 
processing time and system resources, and at the same time preserves the object. 
After a while, unused objects of the prior records are ultimately removed (e.g., a 
person was a visitor for a single time) similar to object aging mechanism (Sec- 
tion [3T2]). 

The prior records can be organized in the same manner as class and relation con- 
cept domains. The matching of detected objects and those from the prior records 
is similar to class matching (Section|4.1). 



5 Experimental Tests 

The practical realization of the current contribution is performed within the 
Deutsche Forschungsgemeinschaft (DFG) Sonderforschungsbereich (SFB) 588 
“Flumanoid Robots - Learning and Cooperating Multimodal Robots” project (Fig- 
ure HZ). The world model contains environment knowledge in form of objects and 
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relations and serves as an information hub for all subsystems of the robot. 




Figure 5.1: SFB 588 world modelling (Armar-III robot) IM2I . 



Also, a reduced world modelling was implemented in Network Enabled Surveil- 
lance and Tracking (NEST) project, developed at the Fraunhofer-Institute for In- 
formation and Data Processing IITB. “The model represents relevant information 
extracted from a large number of sensors of a surveillance system, fused into a 
single comprehensive, dynamic model of the monitored area" l lEGBOSlI . 

The world model in SFB 588 is realized in a modular cross-platform (i.e., Windows 
or Linux) architecture. The Core subsystem is responsible for dynamic models and 
the Back-End for the prior knowledge (Figure |5?2}. 
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Figure 5.2: Object-oriented world modelling architecture. 
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The communication interface to the External system is realized via XML data ex- 
change. The XML coupling allows serialization capability for persistent storage, 
network interchange, or snapshots recording. 

The OOWM Core front-end consists of the Memory Store and Management and 
Consistency modules. The Memory Store module contains semantic networks of 
objects with attributes of arbitrary type. It provides robust processing and clone 
storage independent of external objects in a multi-threaded and transaction safe 
manner. The Core Management and Consistency module provides interfaces for 
external subsystems and performs basic consistency checks (e.g., a cup is reported 
to be two cups with overlapping spatial disposition). It also communicates with 
the Back-End subsystem. 

The OOWM Back-End subsystem consists of two parts: the Semantic Relations 
and Management and Consistency modules. The Semantic Relations retain seman- 
tic networks of prior knowledge and attributes schema. The Back-End Manage- 
ment and Consistency module communicates with the Core subsystem, delivers 
information per queries (matching), and performs basic consistency checks (e.g., 
checks on type miss-match). 



6 Conclusion 



This contribution presents an object-oriented world modeling for autonomous sys- 
tems based on hybrid Progressive Mapping and classification approach. The world 
modeling provides a foundation for a permanent situation awareness and serves 
as an information storage and hub for all subsystems. The Progressive Mapping 
allows a much more flexible dynamic description of environment elements com- 
pared to a direct classification approach. Dynamic relations and prior knowledge 
are represented by multiple semantic networks. All object attributes as well as the 
object existence are defined as probability distributions in a form of Degree of Be- 
liefs, which gives the advantage for information processing by means of Bayesian 
fusion. The class matching approach enables matching of prior knowledge and the 
model objects on a probabilistic basis. A mechanism for constant information re- 
confirmation is given by objects and attributes aging. The aging is represented by 
a set of factors affecting the DoB distributions, which allows estimation of object 
time limits and memory usage at an arbitrary moment. The overall architecture 
presented in the current contribution is developed within the DFG SFB 588 “Hu- 
manoid Robots - Learning and Cooperating Multimodal Robots” project and can 
be used as a high-level memory model for any autonomous or surveillance system. 
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